flutter_whisper_kit 0.2.0
flutter_whisper_kit: ^0.2.0 copied to clipboard
A Flutter plugin for on-device speech recognition and transcription using WhisperKit.
flutter_whisper_kit #
A Flutter plugin that provides on-device speech recognition capabilities using WhisperKit. Achieve high-quality speech-to-text transcription while maintaining privacy.
Features #
- 🔒 Complete On-Device Processing - No data is sent to external servers
- 🎯 High-Accuracy Speech Recognition - High-quality transcription with Whisper models
- 📱 Multiple Model Sizes - Choose from tiny to large based on your needs
- 🎙️ Real-Time Transcription - Convert microphone audio in real-time
- 📁 File-Based Transcription - Support for audio file transcription
- 📊 Progress Tracking - Monitor model download progress
- 🌍 Multi-Language Support - Supports 100+ languages
- ⚡ Type-Safe Error Handling - Safe error handling with Result type
Platform Support #
Platform | Minimum Version | Status |
---|---|---|
iOS | 16.0+ | ✅ Fully Supported |
macOS | 13.0+ | ✅ Fully Supported |
Android | - | 🚧 Planned for Future Release |
Installation #
Add to your pubspec.yaml
:
dependencies:
flutter_whisper_kit: ^0.2.0
iOS Configuration #
Add these permissions to your iOS app's ios/Runner/Info.plist
:
<key>NSMicrophoneUsageDescription</key>
<string>This app needs microphone access to record audio for speech transcription</string>
<key>NSDownloadsFolderUsageDescription</key>
<string>This app needs to access your Downloads folder to store WhisperKit models</string>
<key>NSDocumentsFolderUsageDescription</key>
<string>This app needs to access your Documents folder to store WhisperKit models</string>
macOS Configuration #
Add these permissions to your macOS app's macos/Runner/Info.plist
:
<key>NSMicrophoneUsageDescription</key>
<string>This app needs microphone access to record audio for speech transcription</string>
<key>NSLocalNetworkUsageDescription</key>
<string>This app needs to access your local network to download WhisperKit models</string>
<key>NSDownloadsFolderUsageDescription</key>
<string>This app needs to access your Downloads folder to store WhisperKit models</string>
<key>NSDocumentsFolderUsageDescription</key>
<string>This app needs to access your Documents folder to store WhisperKit models</string>
Also, ensure your macOS deployment target is set to 13.0 or higher in macos/Runner.xcodeproj/project.pbxproj
:
MACOSX_DEPLOYMENT_TARGET = 13.0;
Usage #
Basic Usage #
import 'package:flutter_whisper_kit/flutter_whisper_kit.dart';
// Create an instance of the plugin
final whisperKit = FlutterWhisperKit();
// Load a model
final result = await whisperKit.loadModel(
'tiny', // Model size: tiny, base, small, medium, large-v2, large-v3
modelRepo: 'argmaxinc/whisperkit-coreml',
);
print('Model loaded: $result');
// Transcribe from audio file
final transcription = await whisperKit.transcribeFromFile(
'/path/to/audio/file.mp3',
options: DecodingOptions(
task: DecodingTask.transcribe,
language: 'en', // Specify language (null for auto-detection)
),
);
print('Transcription: ${transcription?.text}');
Real-Time Transcription #
// Listen to transcription stream
whisperKit.transcriptionStream.listen((transcription) {
print('Real-time transcription: ${transcription.text}');
});
// Start recording
await whisperKit.startRecording(
options: DecodingOptions(
task: DecodingTask.transcribe,
language: 'en',
),
);
// Stop recording
final finalTranscription = await whisperKit.stopRecording();
print('Final transcription: ${finalTranscription?.text}');
Error Handling (Result Type) #
Since v0.2.0, Result type APIs have been added for safer error handling:
// Load model with Result type
final loadResult = await whisperKit.loadModelWithResult(
'tiny',
modelRepo: 'argmaxinc/whisperkit-coreml',
);
loadResult.when(
success: (modelPath) {
print('Model loaded successfully: $modelPath');
},
failure: (error) {
print('Model loading failed: ${error.message}');
// Handle errors by error code
switch (error.code) {
case WhisperKitErrorCode.modelNotFound:
// Handle model not found
break;
case WhisperKitErrorCode.networkError:
// Handle network error
break;
default:
// Handle other errors
}
},
);
// Transcribe with Result type
final transcribeResult = await whisperKit.transcribeFileWithResult(
audioPath,
options: DecodingOptions(language: 'en'),
);
// Handle success/failure with fold method
final text = transcribeResult.fold(
onSuccess: (result) => result?.text ?? 'No result',
onFailure: (error) => 'Error: ${error.message}',
);
Model Management #
// Fetch available models
final models = await whisperKit.fetchAvailableModels(
modelRepo: 'argmaxinc/whisperkit-coreml',
);
// Get recommended models
final recommended = await whisperKit.recommendedModels();
print('Recommended model: ${recommended?.defaultModel}');
// Download model with progress
await whisperKit.download(
variant: 'base',
repo: 'argmaxinc/whisperkit-coreml',
onProgress: (progress) {
print('Download progress: ${(progress.fractionCompleted * 100).toStringAsFixed(1)}%');
},
);
// Monitor model progress stream
whisperKit.modelProgressStream.listen((progress) {
print('Model progress: ${progress.fractionCompleted * 100}%');
});
Language Detection #
// Detect language from audio file
final detection = await whisperKit.detectLanguage(audioPath);
print('Detected language: ${detection?.language}');
print('Confidence: ${detection?.probabilities[detection.language]}');
Advanced Configuration #
// Custom decoding options
final options = DecodingOptions(
verbose: true, // Enable verbose logging
task: DecodingTask.transcribe, // transcribe or translate
language: 'en', // Language code (null for auto-detection)
temperature: 0.0, // Sampling temperature (0.0-1.0)
temperatureFallbackCount: 5, // Temperature fallback count
wordTimestamps: true, // Enable word timestamps
chunkingStrategy: ChunkingStrategy.vad, // Chunking strategy
);
// Detailed transcription results
final result = await whisperKit.transcribeFromFile(audioPath, options: options);
if (result != null) {
print('Text: ${result.text}');
print('Language: ${result.language}');
// Segment information
for (final segment in result.segments) {
print('Segment ${segment.id}: ${segment.text}');
print(' Start: ${segment.startTime}s, End: ${segment.endTime}s');
// Word timing information (if wordTimestamps: true)
for (final word in segment.words) {
print(' Word: ${word.word} (${word.start}s - ${word.end}s)');
}
}
}
Model Size Selection #
Choose the appropriate model size based on your use case:
Model | Size | Speed | Accuracy | Use Case |
---|---|---|---|---|
tiny | ~39MB | Very Fast | Low | Real-time processing, battery-conscious |
tiny-en | ~39MB | Very Fast | Low (English only) | English-only real-time processing |
base | ~145MB | Fast | Medium | Balanced performance |
small | ~466MB | Medium | High | When higher accuracy is needed |
medium | ~1.5GB | Slow | Higher | When even higher accuracy is needed |
large-v2 | ~2.9GB | Very Slow | Very High | When maximum accuracy is needed |
large-v3 | ~2.9GB | Very Slow | Highest | Latest and highest accuracy |
Example App #
The example
folder contains a sample app that demonstrates all features:
cd packages/flutter_whisper_kit/example
flutter run
Troubleshooting #
Build errors on iOS/macOS #
- Check minimum deployment target (iOS 16.0+, macOS 13.0+)
- Update to the latest Xcode
- Run
pod install
Model download fails #
- Check network connection
- Ensure sufficient storage space
- Try
redownload: true
option
Low transcription accuracy #
- Try a larger model size
- Explicitly specify language with
language
parameter - Adjust
temperature
parameter (0.0 for more deterministic, 1.0 for more creative)
License #
This project is licensed under the MIT License. See the LICENSE file for details.
Contributing #
Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.
Acknowledgments #
This plugin is based on WhisperKit. Thanks to the Argmax Inc. team for providing this excellent library.