whisper_kit 0.3.1
whisper_kit: ^0.3.1 copied to clipboard
On-device speech-to-text for Flutter using OpenAI Whisper. Supports offline transcription, 99 languages, export to SRT/VTT, batch processing, and more.
Whisper Kit #
On-device speech-to-text for Flutter using OpenAI Whisper via whisper.cpp. Transcribe audio locally (no cloud API required once models are downloaded).
Features #
- Offline transcription (models downloaded on first use)
- Multiple Whisper model sizes (
tiny,base,small,medium,large-v1,large-v2) - Language auto-detection or fixed language
- Optional translation to English
- Timestamped segments (optional)
- Download progress callback
- Typed exceptions (
ModelException,AudioException,TranscriptionException,PermissionException)
Stable API (recommended) #
Whisper+TranscribeRequestfor transcriptiondownloadModel(...)for manual downloads with progress- Catch
WhisperKitException(or its typed subclasses) for errors
Experimental modules #
This package exports a number of optional helpers under whisper_kit/src/* (batching, caching, telemetry, cloud storage, etc.). Consider them experimental unless explicitly documented as stable.
Installation #
Add to your pubspec.yaml:
dependencies:
whisper_kit: ^0.3.1
Then:
flutter pub get
Android permissions #
If you record audio from the mic, add:
<uses-permission android:name="android.permission.RECORD_AUDIO" />
Model download requires network access. After models are downloaded, transcription can run offline.
Platform Support #
| Platform | Status |
|---|---|
| Android | Working |
| iOS | Working (beta) |
| macOS | Experimental |
Getting Started #
1. Import the Package #
In your Dart code, import the whisper_kit library:
import 'package:whisper_kit/whisper_kit.dart';
2. Basic Usage Example #
Example of how to transcribe a WAV file:
Audio File Transcription
import 'package:whisper_kit/whisper_kit.dart';
class TranscriptionExample {
Future<void> transcribeAudioFile() async {
final String audioPath = '/path/to/your/audio.wav';
// Create a Whisper instance with your preferred model
final Whisper whisper = Whisper(
model: WhisperModel.base,
// Optional: custom download host (defaults to HuggingFace)
downloadHost: 'https://huggingface.co/ggerganov/whisper.cpp/resolve/main',
// Optional: download progress (received bytes, total bytes)
onDownloadProgress: (received, total) {
final pct = total > 0 ? (received / total * 100).toStringAsFixed(1) : '?';
print('Model download: $pct%');
},
);
// Create a transcription request
final TranscribeRequest request = TranscribeRequest(
audio: audioPath,
language: 'auto', // 'auto' for detection, or specify: 'en', 'es', 'fr', etc.
);
try {
final WhisperTranscribeResponse result = await whisper.transcribe(
transcribeRequest: request,
);
print('Transcription: ${result.text}');
// Access segments if available
if (result.segments != null) {
for (final segment in result.segments!) {
print('[${segment.fromTs} - ${segment.toTs}]: ${segment.text}');
}
}
} on ModelException catch (e) {
print('Model error: $e');
} on AudioException catch (e) {
print('Audio error: $e');
} on TranscriptionException catch (e) {
print('Transcription error: $e');
}
}
}
Transcription with Translation
import 'package:whisper_kit/whisper_kit.dart';
class TranslationExample {
Future<void> transcribeAndTranslate() async {
final Whisper whisper = Whisper(model: WhisperModel.small);
// Enable translation to English
final TranscribeRequest request = TranscribeRequest(
audio: '/path/to/foreign_language_audio.wav',
isTranslate: true, // Translates to English
language: 'auto', // Auto-detect source language
);
try {
final WhisperTranscribeResponse result = await whisper.transcribe(
transcribeRequest: request,
);
print('Translated text: ${result.text}');
} catch (e) {
print('Error: $e');
}
}
}
Model Download with Progress Tracking
import 'package:whisper_kit/whisper_kit.dart';
import 'package:whisper_kit/download_model.dart';
class ModelManager {
Future<void> downloadModelWithProgress() async {
try {
await downloadModel(
model: WhisperModel.base,
destinationPath: '/path/to/model/directory',
onDownloadProgress: (int received, int total) {
final progress = (received / total * 100).toStringAsFixed(1);
print('Download progress: $progress%');
},
);
print('Model downloaded successfully!');
} catch (e) {
print('Error downloading model: $e');
}
}
}
Note: The
Whisperclass automatically downloads the model if it doesn't exist locally when you calltranscribe(). Manual download is only needed if you want progress tracking during download.
3. Advanced Configuration #
import 'package:whisper_kit/whisper_kit.dart';
class AdvancedTranscription {
Future<void> transcribeWithCustomSettings() async {
final Whisper whisper = Whisper(
model: WhisperModel.small,
// Optional: specify custom model storage directory
modelDir: '/custom/path/to/models',
);
final TranscribeRequest request = TranscribeRequest(
audio: '/path/to/audio.wav',
language: 'en', // Specify language or 'auto' for detection
isTranslate: false, // Set to true to translate to English
isNoTimestamps: false, // Set to true to skip segment timestamps
splitOnWord: true, // Split segments on word boundaries
threads: 4, // Number of threads to use
nProcessors: 2, // Number of processors to use
isVerbose: true, // Enable verbose output
);
try {
final WhisperTranscribeResponse result = await whisper.transcribe(
transcribeRequest: request,
);
print('Transcription: ${result.text}');
// Process segments with timestamps
if (result.segments != null) {
for (final segment in result.segments!) {
print('${segment.fromTs} -> ${segment.toTs}: ${segment.text}');
}
}
} catch (e) {
print('Error: $e');
}
}
Future<void> getWhisperVersion() async {
final Whisper whisper = Whisper(model: WhisperModel.none);
final String? version = await whisper.getVersion();
print('Whisper version: $version');
}
}
---
## Screenshots
<div align="center">
### Recording Interface
| Recording Screen | Configuration Options | Model Download Progress |
|:---:|:---:|:---:|
|  |  |  |
### Transcription Results
| Result Display | Model Download |
|:---:|:---:|
|  |  |
### Additional Features
| Audio Management | Status Indicators |
|:---:|:---:|
|  |  |
| Progress Widgets | Processing Display |
|:---:|:---:|
|  |  |
| Main Interface |
|:---:|
|  |
</div>
---
## API Reference
### Core Classes
#### `Whisper`
The main class for transcription operations:
```dart
const Whisper({
required WhisperModel model, // Required: the model to use
String? modelDir, // Optional: custom model storage directory
String? downloadHost, // Optional: custom model download URL
Function(int, int)? onDownloadProgress, // Optional: download progress callback
});
Methods:
Future<WhisperTranscribeResponse> transcribe({required TranscribeRequest transcribeRequest})- Transcribe audio fileFuture<String?> getVersion()- Get the Whisper library version
TranscribeRequest
Configuration for a transcription request:
factory TranscribeRequest({
required String audio, // Path to audio file (WAV format recommended)
bool isTranslate = false, // Translate to English
int threads = 6, // Number of threads
bool isVerbose = false, // Verbose output
String language = 'auto', // Language code or 'auto' for detection
bool isSpecialTokens = false, // Include special tokens
bool isNoTimestamps = false, // Skip timestamp generation
int nProcessors = 1, // Number of processors
bool splitOnWord = false, // Split on word boundaries
bool noFallback = false, // Disable fallback
bool diarize = false, // Speaker-turn detection (tinydiarize)
bool speedUp = false, // Speed up processing (quality tradeoff)
});
WhisperTranscribeResponse
The transcription result:
String text- The transcribed textList<WhisperTranscribeSegment>? segments- Timestamped segments (if timestamps enabled)
WhisperTranscribeSegment
A segment of the transcription with timestamps:
Duration fromTs- Start timestampDuration toTs- End timestampString text- The segment text
WhisperModel
Enum for available model sizes:
WhisperModel.none- No model (for version check only)WhisperModel.tiny- Fastest, least accurate (~75MB)WhisperModel.base- Good balance (~142MB)WhisperModel.small- Better accuracy (~466MB)WhisperModel.medium- Best accuracy (~1.5GB)
downloadModel Function
Standalone function to download models with progress tracking:
Future<void> downloadModel({
required WhisperModel model,
required String destinationPath,
String? downloadHost,
Function(int received, int total)? onDownloadProgress,
});
Audio Requirements #
Supported Formats #
- WAV (required by native core today): 16kHz, 16-bit PCM, mono or stereo
- Other formats (mp3/m4a/flac/ogg) must be converted to WAV before calling
transcribe().
Limitations #
- The native core currently accepts WAV input only (16kHz, 16-bit PCM, mono/stereo). Other formats must be converted before transcription.
Audio Quality Tips #
- Use a quiet environment for best results
- Speak clearly at a normal pace
- Ensure proper microphone placement
- Audio should be at least 1 second long for optimal transcription
Error Handling #
Catch typed exceptions for reliable handling:
try {
final result = await whisper.transcribe(transcribeRequest: request);
print(result.text);
} on ModelException catch (e) {
// Download/validation/model path issues
print(e);
} on AudioException catch (e) {
// WAV requirements not met, file missing, etc
print(e);
} on TranscriptionException catch (e) {
// Native processing errors
print(e);
}
Performance Considerations #
- Model Size: Larger models provide better accuracy but require more processing time and memory
- Device Requirements: Minimum 4GB RAM recommended for smooth operation
- Battery Usage: Continuous transcription can be battery-intensive
- Storage: Ensure sufficient space for downloaded models (75MB - 1.5GB per model)
Project Structure #
├── lib/ # Public Dart API │ ├── whisper_kit.dart # Main entrypoint │ └── download_model.dart # Model download helper ├── src/ # Native whisper.cpp bridge (C/C++) ├── ios/src/ # iOS native whisper.cpp bridge (C/C++) ├── ios/Classes/ # iOS method-channel implementation ├── android/ # Android build scaffolding for the FFI plugin └── example/ # Minimal demo app using bundled WAV assets
Contributing #
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Development Setup #
- Clone the repository
- Run
flutter pub getin the root directory - Navigate to the example app:
cd example - Run the example:
flutter run
License #
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments #
- OpenAI Whisper for the original speech recognition model
- whisper.cpp for the efficient C++ implementation
- Flutter community for feedback and support
Documentation #
doc/GETTING_STARTED.mddoc/API_REFERENCE.mddoc/PERFORMANCE_GUIDE.md