whisper_ggml_plus 1.2.12
whisper_ggml_plus: ^1.2.12 copied to clipboard
Whisper.cpp Flutter plugin with Large-v3-Turbo (128-mel) support.
Whisper GGML Plus #
High-performance OpenAI Whisper ASR (Automatic Speech Recognition) for Flutter using the latest Whisper.cpp v1.8.3 engine. Fully optimized for Large-v3-Turbo and hardware acceleration.
Key Upgrades in "Plus" Version #
- Major Engine Upgrade: Synchronized with
whisper.cppv1.8.3, featuring the new dynamicggml-backendarchitecture. - Large-v3-Turbo Support: Native support for 128 mel bands, allowing you to use the latest Turbo models with high accuracy and speed.
- Hardware Acceleration: Out-of-the-box support for CoreML (NPU) and Metal (GPU) on iOS and macOS.
- Persistent Context: Models are cached in memory. After the first load, subsequent transcriptions start instantly without re-loading weights.
- GGUF Support: Compatible with the modern GGUF model format for better performance and memory efficiency.
Supported platforms #
| Platform | Supported | Acceleration | VAD |
|---|---|---|---|
| Android | ✅ | CPU (SIMD) | ❌ |
| iOS | ✅ | CoreML/Metal | ✅ |
| MacOS | ✅ | Metal | ✅ |
Features #
- Automatic Speech Recognition: Seamless integration for Flutter apps.
- Offline Capability: Can be configured to work fully offline by using models from local assets.
- Multilingual: Auto-detect language or specify codes like "en", "ko", "ja", etc.
- VAD (Voice Activity Detection): Automatic silence skipping for 2-3x faster transcription on iOS/macOS.
- Flash Attention: Enabled for better performance on supported hardware.
Installation #
Add the library to your Flutter project's pubspec.yaml:
dependencies:
whisper_ggml_plus: ^1.2.10
Run flutter pub get to install the package.
Usage #
1. Import the package #
import 'package:whisper_ggml_plus/whisper_ggml_plus.dart';
2. Pick your model #
For best performance on mobile, tiny, base, or small are recommended. For high accuracy, use largeV3Turbo.
final model = WhisperModel.largeV3Turbo; // Native support for Turbo (128 mel)
3. Transcribe Audio #
Declare WhisperController and use it for transcription.
final controller = WhisperController();
final result = await controller.transcribe(
model: model,
audioPath: audioPath,
lang: 'auto', // 'en', 'ko', 'ja', or 'auto' for detection
withTimestamps: true, // Set to false to hide timestamps
convert: true, // Set to false if audioPath is already 16kHz mono .wav
);
4. Handle Result #
if (result != null) {
print("Transcription: ${result.transcription.text}");
// Segments are available if withTimestamps is true
for (var segment in result.transcription.segments) {
print("[${segment.fromTs} -> ${segment.toTs}] ${segment.text}");
}
}
Optimization Tips #
- Release Mode: Always test performance in
--releasemode. Native optimizations (SIMD/Metal) are significantly more effective. - Model Quantization: Use quantized models (e.g.,
q4_0,q5_0, orq2_k) to reduce RAM usage, especially when using Large-v3-Turbo on mobile devices. - Naming Convention for CoreML: To ensure CoreML detection works, keep the quantization suffix in the filename using the 5-character format (e.g.,
ggml-large-v3-turbo-q5_0.bin). The engine uses this to correctly locate the-encoder.mlmodelcdirectory.
CoreML Acceleration (Optional) #
For 3x+ faster transcription on Apple Silicon devices (M1+, A14+), you can optionally add a CoreML encoder:
What is .mlmodelc?
.mlmodelc is a compiled CoreML model directory (not a single file) containing:
model.mil- CoreML model intermediate languagecoremldata.bin- Model weights optimized for Apple Neural Enginemetadata.json- Model configuration
Important: .mlmodelc is a directory with multiple files, not a single file. This affects how you deploy it.
1. Generate CoreML Encoder
# Clone whisper.cpp repository
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
# Create Python 3.11 environment
python3.11 -m venv venv
source venv/bin/activate
# Install dependencies
pip install torch==2.5.0 "numpy<2.0" coremltools==8.1 openai-whisper ane_transformers
# Generate CoreML encoder (example: large-v3-turbo)
./models/generate-coreml-model.sh large-v3-turbo
# Output: models/ggml-large-v3-turbo-encoder.mlmodelc/ (directory, ~1.2GB)
2. Deploy CoreML Model
⚠️ CRITICAL: .mlmodelc cannot be bundled via Flutter assets!
Flutter assets system doesn't preserve directory structures for custom folders, which breaks CoreML models. You must use one of these deployment methods:
Option A: Download at Runtime (Recommended)
import 'package:path_provider/path_provider.dart';
import 'package:http/http.dart' as http;
Future<String> downloadCoreMLModel() async {
final appSupport = await getApplicationSupportDirectory();
final modelDir = Directory('${appSupport.path}/models');
await modelDir.create(recursive: true);
final mlmodelcDir = Directory('${modelDir.path}/ggml-large-v3-turbo-encoder.mlmodelc');
if (!await mlmodelcDir.exists()) {
// Download and extract .mlmodelc directory from your server
// Each file inside .mlmodelc must be downloaded separately
await mlmodelcDir.create(recursive: true);
await downloadFile('https://your-cdn.com/model.mil', '${mlmodelcDir.path}/model.mil');
await downloadFile('https://your-cdn.com/coremldata.bin', '${mlmodelcDir.path}/coremldata.bin');
await downloadFile('https://your-cdn.com/metadata.json', '${mlmodelcDir.path}/metadata.json');
}
return '${modelDir.path}/ggml-large-v3-turbo-q3_k.bin';
}
Option B: iOS/macOS Native Bundle (Advanced)
For iOS, manually add .mlmodelc to Xcode project:
- Open
ios/Runner.xcworkspacein Xcode - Drag
.mlmodelcfolder to project navigator - Ensure "Create folder references" (not "Create groups") is selected
- Add to target: Runner
Then access via bundle path:
import 'dart:io';
String getCoreMLPath() {
if (Platform.isIOS || Platform.isMacOS) {
// Xcode bundles .mlmodelc as folder reference
return '/path/in/bundle/ggml-large-v3-turbo-encoder.mlmodelc';
}
return '';
}
3. Place CoreML Encoder Alongside GGML Model
/app/support/models/
├── ggml-large-v3-turbo-q3_k.bin
└── ggml-large-v3-turbo-encoder.mlmodelc/ ← Must be in same directory
├── model.mil
├── coremldata.bin
└── metadata.json
Naming Convention:
- GGML model:
ggml-{model-name}-{quantization}.bin - CoreML model:
ggml-{model-name}-encoder.mlmodelc/(base name must match)
Example pairs:
ggml-large-v3-turbo-q3_k.bin+ggml-large-v3-turbo-encoder.mlmodelc/ggml-base-q5_0.bin+ggml-base-encoder.mlmodelc/
4. Use Normally
final result = await controller.transcribe(
model: '/app/support/models/ggml-large-v3-turbo-q3_k.bin',
audioPath: audioPath,
lang: 'auto',
);
// whisper.cpp automatically detects and uses CoreML encoder if present
How Detection Works
When you load a GGML model (e.g., ggml-large-v3-turbo-q3_k.bin), whisper.cpp automatically:
- Strips quantization suffix:
ggml-large-v3-turbo-q3_k.bin→ggml-large-v3-turbo - Looks for
ggml-large-v3-turbo-encoder.mlmodelc/in the same directory - If found and valid: Uses CoreML (NPU) acceleration
- If not found: Falls back to Metal (GPU) acceleration
No code changes needed - detection is automatic!
Troubleshooting
CoreML model not detected:
[CoreML Debug] whisper_coreml_init called
[CoreML Error] Model file/directory does not exist at path: /path/to/model.mlmodelc
Common causes:
- Wrong path:
.mlmodelcmust be in same directory as.binfile - Not a directory:
.mlmodelcis a directory, not a file - check with file manager - Flutter assets: Cannot bundle via
pubspec.yamlassets - use runtime download or native bundle - Name mismatch: Base names must match (e.g.,
ggml-base-q5.binneedsggml-base-encoder.mlmodelc)
Check if CoreML is working:
[CoreML Debug] CoreML model loaded successfully!
If you see this log, CoreML (NPU) is active. Otherwise, Metal (GPU) is used.
Performance Comparison
| Acceleration | Device | Speed | Battery | Storage |
|---|---|---|---|---|
| CoreML (NPU) | Apple Silicon | 3-5x faster | Best | +1.2GB |
| Metal (GPU) | iOS/macOS | 2-3x faster | Good | - |
| CPU (SIMD) | Android | 1x (baseline) | Fair | - |
Recommendation:
- Large-v3-Turbo: Use CoreML if storage allows - significant speed + battery improvement
- Base/Small models: Metal is sufficient - CoreML overhead not worth it
- Android: CoreML not available - CPU SIMD only
Notes
- CoreML encoder works with all quantization variants (q3_k, q5_0, q8_0, etc.) of the same base model
- If
.mlmodelcis not present, Metal (GPU) acceleration is used automatically on iOS/macOS - CoreML requires ~1.2GB additional storage per model but provides 3x+ speedup and better battery life
- Android does not support CoreML - CPU optimization only
License #
MIT License - Based on the original work by sk3llo/whisper_ggml.