flutter_openai_realtime_api 0.1.2
flutter_openai_realtime_api: ^0.1.2 copied to clipboard
Flutter client for the OpenAI Realtime GA API. WebRTC and WebSocket transports for low-latency voice and text conversations.
flutter_openai_realtime_api #
Flutter client for the OpenAI Realtime GA API. Targets gpt-realtime-2
and the rest of the May 2026 gpt-realtime-* lineup over WebRTC for
low-latency voice conversations and over WebSocket for server-side or
text-only use.
The OpenAI Realtime API exposes two transports: WebSocket (text + base64 PCM) and WebRTC (voice over a real RTP track, with native echo cancellation and the lowest end-to-end latency). Other Dart/Flutter clients implement WebSocket only — this package is the only one that implements the WebRTC variant, which is what the Realtime API was designed for in the voice case.
Full API reference: https://pub.dev/documentation/flutter_openai_realtime_api/latest/
Install #
flutter pub add flutter_openai_realtime_api
Or add manually:
dependencies:
flutter_openai_realtime_api: ^0.0.1
Quick start #
A Flutter client connecting via WebRTC needs an ephemeral token from your backend (see Backend setup, below). With a token provider in hand:
import 'package:flutter_openai_realtime_api/flutter_openai_realtime_api.dart';
final client = RealtimeClient.webRtc(RealtimeConfig(
tokenProvider: myTokenProvider,
voice: Voice.alloy,
instructions: 'You are a helpful, concise assistant.',
turnDetection: const ServerVad.quick(),
));
client.events.listen((event) {
if (event is InputAudioTranscriptionCompleted) {
print('You said: ${event.transcript}');
} else if (event is ResponseAudioTranscriptDone) {
print('AI said: ${event.transcript}');
}
});
await client.connect();
// ... talk normally; server VAD turns over automatically ...
await client.dispose();
The same client supports text only:
await client.sendMessage('Summarise the conversation so far.');
WebSocket transport #
For server-side Dart (Cloud Functions, Shelf, CLIs) or text-only Flutter
clients, use the WebSocket transport. Browsers cannot set the
Authorization header on a WebSocket upgrade, so this transport is not
appropriate for Flutter Web.
final client = RealtimeClient.webSocket(RealtimeConfig(
apiKey: openAiKey, // long-lived sk-… key, server-side only
));
await client.connect();
await client.sendMessage('Generate a haiku about caching.');
await client.dispose();
Choose WebRTC for voice from a Flutter app; choose WebSocket for server-side workloads or when you only need text round-trips.
Backend setup #
Long-lived OpenAI keys (sk-…) must never ship in a Flutter binary —
anyone who unpacks the app gets the key. The Realtime API solves this
with ephemeral client secrets (ek_…): short-lived strings your
backend mints on behalf of an authenticated user, used as the
Authorization: Bearer token only for the SDP exchange.
Flutter client Your backend OpenAI
| | |
| POST /realtime/ | |
| token --->| |
| (Firebase ID token) | |
| | POST /v1/realtime/ |
| | client_secrets |
| | Bearer sk-... --->|
| | <--- {value: ek_...} |
| <-- {token: ek_..., | |
| expiresAt: ...} | |
| | |
| POST /v1/realtime/calls (Bearer ek_...) ---> (SDP exchange)
The OpenAI endpoint #
POST https://api.openai.com/v1/realtime/client_secrets
Authorization: Bearer <YOUR_OPENAI_API_KEY>
Content-Type: application/json
Request body:
{
"expires_after": { "anchor": "created_at", "seconds": 120 },
"session": {
"type": "realtime",
"model": "gpt-realtime-2"
}
}
Response body:
{
"value": "ek_68af296e8e408191a1120ab6383263c2",
"expires_at": 1735776000,
"session": { "id": "sess_…", "...": "fully-resolved server view" }
}
expires_after.seconds accepts 10–7200 (default 600). Keep it short
(60–180): the token only needs to live long enough to complete the SDP
exchange. Once the WebRTC connection is up, the call runs to OpenAI's
hard 30-minute server cap regardless of token TTL.
Firebase Cloud Functions example #
If your app already uses Firebase Auth, this is the most idiomatic backend.
// functions/src/realtimeToken.ts
import { onCall, HttpsError } from "firebase-functions/v2/https";
import { defineSecret } from "firebase-functions/params";
import { getFirestore, FieldValue } from "firebase-admin/firestore";
import { initializeApp, getApps } from "firebase-admin/app";
import { logger } from "firebase-functions/v2";
if (!getApps().length) initializeApp();
// firebase functions:secrets:set OPENAI_API_KEY
const OPENAI_API_KEY = defineSecret("OPENAI_API_KEY");
export const realtimeToken = onCall(
{
region: "us-central1",
secrets: [OPENAI_API_KEY],
enforceAppCheck: true,
},
async (request) => {
if (!request.auth) {
throw new HttpsError("unauthenticated", "Sign in required.");
}
const uid = request.auth.uid;
// Per-user rate limit (1 mint / 5s).
const db = getFirestore();
const ref = db.doc(`rateLimits/realtimeToken/${uid}/state`);
const now = Date.now();
await db.runTransaction(async (tx) => {
const snap = await tx.get(ref);
const last = (snap.data()?.lastMintMs as number | undefined) ?? 0;
if (now - last < 5_000) {
throw new HttpsError("resource-exhausted", "Slow down.");
}
tx.set(ref, { lastMintMs: now, updatedAt: FieldValue.serverTimestamp() });
});
const r = await fetch("https://api.openai.com/v1/realtime/client_secrets", {
method: "POST",
headers: {
Authorization: `Bearer ${OPENAI_API_KEY.value()}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
expires_after: { anchor: "created_at", seconds: 120 },
session: {
type: "realtime",
model: "gpt-realtime-2",
audio: { output: { voice: "alloy" } },
},
}),
});
if (!r.ok) {
logger.error("openai_mint_failed", {
status: r.status,
body: await r.text(),
});
throw new HttpsError("unavailable", "Upstream error.");
}
const data = (await r.json()) as { value: string; expires_at: number };
logger.info("minted", { uid, expiresAt: data.expires_at });
return { token: data.value, expiresAt: data.expires_at };
},
);
Calling it from Flutter:
import 'package:cloud_functions/cloud_functions.dart';
import 'package:flutter_openai_realtime_api/flutter_openai_realtime_api.dart';
class FirebaseTokenProvider implements EphemeralTokenProvider {
@override
Future<EphemeralToken> getToken() async {
final result = await FirebaseFunctions.instance
.httpsCallable('realtimeToken')
.call();
return EphemeralToken(
value: result.data['token'] as String,
expiresAt: DateTime.fromMillisecondsSinceEpoch(
(result.data['expiresAt'] as int) * 1000,
),
);
}
}
final provider = CachingEphemeralTokenProvider(
fetcher: FirebaseTokenProvider().getToken,
);
final client = RealtimeClient.webRtc(RealtimeConfig(
tokenProvider: provider,
// ...
));
CachingEphemeralTokenProvider reuses the cached token until it is
within 10 s of expiry, and de-duplicates concurrent in-flight fetches.
defineSecret keeps the OpenAI key in Google Secret Manager so it
never appears in source control or build logs. enforceAppCheck: true
restricts the function to your real app binaries.
Express / FastAPI / other backends #
The contract is identical: authenticate the user, call
POST /v1/realtime/client_secrets, return {token, expiresAt} to the
client. Any HTTP server works.
CORS for Flutter Web #
Native HTTP requests are not subject to CORS. Flutter Web is. For Web clients:
- Set
Access-Control-Allow-Originto your exact app origin. - Handle the
OPTIONSpreflight forPOSTwithAuthorizationandContent-Typeheaders. Firebase callables handle this automatically; for raw HTTP functions enable thecorsoption. - The SDP exchange (
POST /v1/realtime/calls) goes directly from the browser to OpenAI — there is no need to proxy it through your backend.
Production checklist #
- ❌ OpenAI key stored in a real secret manager.
- ❌ Token endpoint requires an authenticated user.
- ❌ Per-user and per-IP rate limits.
- ❌ Per-user daily/monthly minute quota.
- ❌
expires_after.secondsset to 60–180. - ❌ CORS configured for your Flutter Web origin (if applicable).
- ❌ App Check (or equivalent attestation).
- ❌ Server-side allowlist for
session.*fields the client may set. - ❌ No
sk-…value in the client repo or build artifacts.
Platform setup #
iOS #
ios/Runner/Info.plist:
<key>NSMicrophoneUsageDescription</key>
<string>Used for voice conversations with the AI assistant.</string>
For background audio:
<key>UIBackgroundModes</key>
<array><string>audio</string><string>voip</string></array>
Android #
android/app/src/main/AndroidManifest.xml:
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
android/app/build.gradle:
android { defaultConfig { minSdkVersion 21 } } // flutter_webrtc minimum
Use permission_handler (or similar) at runtime to request microphone
permission before client.connect().
Web #
Must be served over HTTPS (or localhost). Browsers will not autoplay
audio until the page has had a user gesture, so gate connect()
behind a button tap.
The package does not auto-attach the remote audio track to a DOM
<audio> element on Web — Flutter's rendering layer cannot reach the
browser's audio output without help. In your app, listen for the
onTrack event on the underlying RTCPeerConnection (exposed via
flutter_webrtc) and route the remote MediaStream to an <audio>
element through flutter_webrtc's renderer or a small JS interop call.
Native platforms wire this automatically.
macOS #
macos/Runner/Info.plist:
<key>NSMicrophoneUsageDescription</key>
<string>Used for voice conversations.</string>
Apply the same three keys to both macos/Runner/DebugProfile.entitlements
and macos/Runner/Release.entitlements:
<key>com.apple.security.device.audio-input</key>
<true/>
<key>com.apple.security.network.client</key>
<true/>
<key>com.apple.security.network.server</key>
<true/>
network.client is required for the outbound HTTPS SDP exchange.
network.server is required for WebRTC media: the sandboxed app must
accept inbound UDP for ICE/RTP to flow.
The example app at example/macos/ ships with these set up — copy it
as a working reference if you'd rather not edit the plists by hand.
Voice and model selection #
| Field | Default | Choices |
|---|---|---|
model |
gpt-realtime-2 |
See the model table below. |
voice |
(server default) | alloy, ash, ballad, coral, echo, sage, shimmer, verse, marin (gpt-realtime), cedar (gpt-realtime) |
marin and cedar work only with gpt-realtime; the other eight work
with every current model.
Realtime models #
| Model ID | Notes |
|---|---|
gpt-realtime-2 |
Default. Reasoning S2S model (released 2026-05-07). 128k context, configurable reasoning effort. |
gpt-realtime |
Rolling alias for the previous GA speech-to-speech model. |
gpt-realtime-1.5 |
Non-reasoning S2S model tuned for the lowest latency (released 2026-02-24). |
gpt-realtime-mini |
Rolling alias for the smaller/cheaper mini variant. |
gpt-realtime-mini-2025-12-15 |
Dated mini snapshot. |
gpt-realtime-mini-2025-10-06 |
Dated mini snapshot. |
gpt-realtime-2025-08-28 |
Dated snapshot of the base gpt-realtime. |
gpt-realtime-translate |
Specialised model for live speech-to-speech translation (70+ in / 13 out languages, released 2026-05-07). |
gpt-realtime-whisper |
Specialised low-latency streaming speech-to-text (released 2026-05-07). Returns transcripts, not synthesised audio. |
The legacy gpt-4o-realtime-preview-* previews shut down on
2026-05-07 and are not supported.
Echo cancellation on Android #
Android's getUserMedia echo cancellation does not reliably stop
loudspeaker audio from being picked up by the mic when the user is not
wearing headphones. MuteStrategy.aggressive mitigates this by
replacing the outbound audio track with null while the assistant is
speaking, which stops RTP entirely. MuteStrategy.auto (the default)
uses aggressive on Android and standard everywhere else.
Interrupting the assistant #
To barge in mid-utterance, send the three-step interruption sequence:
await client.cancelResponse(); // 1. stop generation
await client.clearOutputAudioBuffer(); // 2. flush server-side audio queue (WebRTC only)
await client.truncateConversation( // 3. reconcile history with what the user actually heard
itemId: itemId,
contentIndex: 0,
audioEndMs: playbackPositionMs,
);
When turn_detection.interrupt_response is true (the default for
both ServerVad and SemanticVad), the server runs the equivalent
sequence automatically when it detects new user speech. The manual API
is for cases where your UI surfaces an explicit interrupt control.
Function calling #
final tool = Tool(
name: 'get_weather',
description: 'Get current weather for a city.',
parameters: const {
'type': 'object',
'properties': {
'city': {'type': 'string'},
},
'required': ['city'],
},
);
final client = RealtimeClient.webRtc(RealtimeConfig(
tokenProvider: myTokenProvider,
tools: [tool],
));
client.events.listen((event) async {
if (event is ResponseFunctionCallArgumentsDone) {
final args = jsonDecode(event.arguments) as Map<String, dynamic>;
final result = await getWeather(args['city'] as String);
await client.createConversationItem(
ConversationItem.functionCallOutput(
callId: event.callId,
output: jsonEncode(result),
),
);
// The server does NOT auto-respond after a tool result. You must
// explicitly ask it for the next turn.
await client.createResponse();
}
});
Logging #
The package emits via package:logging. Attach a console listener at
app start:
RealtimeLogging.enableConsoleOutput();
License #
MIT. See LICENSE.