gemini_tts_wrapper
Client-side (REST) Gemini TTS wrapper for one-shot audio output.
This package calls the Generative Language API :generateContent endpoint with
response_mime_type set to an audio type (default: audio/wav) and returns the
decoded audio bytes as Uint8List.
It also includes Uint8ListAudioSource to play those bytes directly from memory
via just_audio (no temporary file required).
Security Note
Using a long-lived Gemini API key directly in a client app is not secure. For production, prefer a server-side proxy (Cloud Functions, Cloud Run, etc.) and short-lived tokens or additional auth.
Installation ๐ป
โ In order to start using Gemini Tts Wrapper you must have the Flutter SDK installed on your machine.
Install via flutter pub add:
flutter pub add gemini_tts_wrapper
Usage
Basic Usage
Generate one-shot TTS audio bytes:
import 'package:gemini_tts_wrapper/gemini_tts_wrapper.dart';
final tts = GeminiTts(apiKey: 'YOUR_API_KEY');
final bytes = await tts.generate(
text: 'Merhaba dunya!',
voice: 'aoide',
responseMimeType: 'audio/wav',
);
Advanced Options
final bytes = await tts.generate(
text: 'Your text here',
voice: 'aoide', // 'aoide', 'charon', or 'puck'
responseMimeType: 'audio/wav',
audioProfile: 'headphone', // May not work in all languages
directorsNote: 'Speak in a casual, natural tone', // May be ignored
);
Play Audio
Play in-memory bytes with just_audio:
import 'package:just_audio/just_audio.dart';
import 'package:gemini_tts_wrapper/gemini_tts_wrapper.dart';
final player = AudioPlayer();
await player.setAudioSource(Uint8ListAudioSource(bytes, contentType: 'audio/wav'));
await player.play();
Dialogue/Multi-Speaker Support
To work around voice mixing issues in dialogues, use the DialogueBuilder:
final builder = DialogueBuilder(
context: 'A conversation at a coffee shop',
speakers: {
'Alice': SpeakerConfig(name: 'Alice', voice: 'aoide'),
'Bob': SpeakerConfig(name: 'Bob', voice: 'charon'),
},
);
builder.addLines([
DialogueLine(speaker: 'Alice', text: 'Hey Bob!'),
DialogueLine(
speaker: 'Bob',
text: 'Hi Alice!',
pausesBefore: ['medium pause'], // [short pause] may not work
),
]);
final generator = DialogueGenerator(tts: tts);
final audioSegments = await generator.generatePerSpeaker(builder);
Text Length Validation
The wrapper includes validation to warn about the ~160 second audio limit:
final result = TtsValidator.validateTextLength(longText);
if (!result.isValid) {
print('Warning: ${result.message}');
// Split into chunks
final chunks = TtsValidator.splitIntoChunks(longText);
}
Known API Limitations (Gemini 3.1 Flash TTS Preview)
โ ๏ธ Important: The Gemini 3.1 Flash TTS API has several known limitations:
1. ~160 Second Audio Hard Limit
- The API accepts unlimited text input but hard-stops audio generation around 160 seconds
- This wrapper validates text length and throws
TtsLengthExceptionif the estimated duration exceeds safe limits - Use
TtsValidator.splitIntoChunks()to split long text into manageable segments
2. Voice Mixing in Dialogues
- Speakers frequently read each other's lines (non-deterministic behavior)
- Workaround: Use
DialogueGenerator.generatePerSpeaker()to generate each speaker's lines separately - The example app demonstrates this approach
3. Audio Profile & Director's Note
- These parameters may not work consistently, especially in non-English languages (e.g., Finnish)
- Voices may default to dramatic/fake tones even when casual/natural is requested
- These options are available but your mileage may vary
4. Pause Tags
[short pause]tags are often ignored by the API- Workaround: Use
[medium pause]or[long pause]instead - The
DialogueLineclass includes helper properties for adding pauses
5. Chunking Required for Long Content
- For content longer than ~160 seconds of audio, you must implement chunking
- The wrapper provides
TtsValidatorutilities to help estimate and split content appropriately
Example App
Run the included example:
cd example
flutter run
The example includes:
- Basic TTS generation with all available options
- Text length validation with warnings
- A complete dialogue demo showing multi-speaker support
Continuous Integration ๐ค
Gemini Tts Wrapper comes with a built-in GitHub Actions workflow powered by Very Good Workflows but you can also add your preferred CI/CD solution.
Out of the box, on each pull request and push, the CI formats, lints, and tests the code. This ensures the code remains consistent and behaves correctly as you add functionality or make changes. The project uses Very Good Analysis for a strict set of analysis options used by our team. Code coverage is enforced using the Very Good Workflows.
Running Tests ๐งช
For first time users, install the very_good_cli:
dart pub global activate very_good_cli
To run all unit tests:
very_good test --coverage
To view the generated coverage report you can use lcov.
# Generate Coverage Report
genhtml coverage/lcov.info -o coverage/
# Open Coverage Report
open coverage/index.html
Libraries
- gemini_tts
- gemini_tts_wrapper
- A Very Good Project created by Very Good CLI.