whisper_ggml_plus 1.3.1
whisper_ggml_plus: ^1.3.1 copied to clipboard
Whisper.cpp Flutter plugin with Large-v3-Turbo (128-mel) support.
Whisper GGML Plus #
High-performance OpenAI Whisper ASR (Automatic Speech Recognition) for Flutter using the latest Whisper.cpp v1.8.3 engine. Fully optimized for Large-v3-Turbo and hardware acceleration.
✨ Key Upgrades in "Plus" Version #
- Major Engine Upgrade: Synchronized with
whisper.cppv1.8.3, featuring the new dynamicggml-backendarchitecture. - Large-v3-Turbo Support: Native support for 128 mel bands for high accuracy and speed.
- Hardware Acceleration: Out-of-the-box support for CoreML (NPU) and Metal (GPU) on iOS and macOS.
- FFmpeg Decoupling (v1.3.0+): No more library conflicts! The core engine is now FFmpeg-free. Use
whisper_ggml_plus_ffmpegfor automatic conversion. - Persistent Context: Models are cached in memory for instant subsequent transcriptions.
🚀 Getting Started #
Starting from v1.3.0, FFmpeg is no longer bundled with the core engine to prevent version conflicts.
For 16kHz Mono WAV files: #
If your audio is already in the correct format, just use the core package.
final controller = WhisperController();
final result = await controller.transcribe(
model: model,
audioPath: 'audio_16khz_mono.wav',
);
For MP3, MP4, and other formats: #
Install the companion package to enable automatic conversion without library conflicts.
- Add both packages:
dependencies:
whisper_ggml_plus: ^1.3.0
whisper_ggml_plus_ffmpeg: ^1.0.0 # Companion package
- Register the converter once at app startup:
import 'package:whisper_ggml_plus/whisper_ggml_plus.dart';
import 'package:whisper_ggml_plus_ffmpeg/whisper_ggml_plus_ffmpeg.dart';
void main() {
// Register FFmpeg converter once
WhisperFFmpegConverter.register();
runApp(MyApp());
}
- Transcribe any format normally:
final result = await controller.transcribe(
model: model,
audioPath: 'recording.mp3', // Automatically converted to 16kHz WAV
);
🛠️ Usage #
1. Import the package #
import 'package:whisper_ggml_plus/whisper_ggml_plus.dart';
2. Pick your model #
final model = WhisperModel.largeV3Turbo;
3. Transcribe Audio #
final controller = WhisperController();
final result = await controller.transcribe(
model: model,
audioPath: audioPath,
lang: 'auto',
withTimestamps: true,
threads: 6,
);
4. Handle Result #
if (result != null) {
print("Transcription: ${result.transcription.text}");
// Segments are available if withTimestamps is true
for (var segment in result.transcription.segments) {
print("[${segment.fromTs} -> ${segment.toTs}] ${segment.text}");
}
}
💡 Optimization Tips #
- Release Mode: Always test performance in
--releasemode. Native optimizations (SIMD/Metal) are significantly more effective. - Model Quantization: Use quantized models (e.g.,
q4_0,q5_0, orq2_k) to reduce RAM usage, especially when using Large-v3-Turbo on mobile devices. - Naming Convention for CoreML: To ensure CoreML detection works, keep the quantization suffix in the filename using the 5-character format (e.g.,
ggml-large-v3-turbo-q5_0.bin). The engine uses this to correctly locate the-encoder.mlmodelcdirectory.
🧠 CoreML Acceleration (Optional) #
For 3x+ faster transcription on Apple Silicon devices (M1+, A14+), you can optionally add a CoreML encoder:
What is .mlmodelc?
.mlmodelc is a compiled CoreML model directory (not a single file) containing:
model.mil- CoreML model intermediate languagecoremldata.bin- Model weights optimized for Apple Neural Enginemetadata.json- Model configuration
Important: .mlmodelc is a directory with multiple files, not a single file. This affects how you deploy it.
1. Generate CoreML Encoder
# Clone whisper.cpp repository
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
# Create Python 3.11 environment
python3.11 -m venv venv
source venv/bin/activate
# Install dependencies
pip install torch==2.5.0 "numpy<2.0" coremltools==8.1 openai-whisper ane_transformers
# Generate CoreML encoder (example: large-v3-turbo)
./models/generate-coreml-model.sh large-v3-turbo
# Output: models/ggml-large-v3-turbo-encoder.mlmodelc/ (directory, ~1.2GB)
2. Deploy CoreML Model
⚠️ CRITICAL: .mlmodelc cannot be bundled via Flutter assets!
Flutter assets system doesn't preserve directory structures for custom folders, which breaks CoreML models. You must use one of these deployment methods:
Option A: Download at Runtime (Recommended)
import 'package:path_provider/path_provider.dart';
import 'package:http/http.dart' as http;
Future<String> downloadCoreMLModel() async {
final appSupport = await getApplicationSupportDirectory();
final modelDir = Directory('${appSupport.path}/models');
await modelDir.create(recursive: true);
final mlmodelcDir = Directory('${modelDir.path}/ggml-large-v3-turbo-encoder.mlmodelc');
if (!await mlmodelcDir.exists()) {
// Download and extract .mlmodelc directory from your server
// Each file inside .mlmodelc must be downloaded separately
await mlmodelcDir.create(recursive: true);
await downloadFile('https://your-cdn.com/model.mil', '${mlmodelcDir.path}/model.mil');
await downloadFile('https://your-cdn.com/coremldata.bin', '${mlmodelcDir.path}/coremldata.bin');
await downloadFile('https://your-cdn.com/metadata.json', '${mlmodelcDir.path}/metadata.json');
}
return '${modelDir.path}/ggml-large-v3-turbo-q3_k.bin';
}
Option B: iOS/macOS Native Bundle (Advanced)
For iOS, manually add .mlmodelc to Xcode project:
- Open
ios/Runner.xcworkspacein Xcode - Drag
.mlmodelcfolder to project navigator - Ensure "Create folder references" (not "Create groups") is selected
- Add to target: Runner
Then access via bundle path:
import 'dart:io';
String getCoreMLPath() {
if (Platform.isIOS || Platform.isMacOS) {
// Xcode bundles .mlmodelc as folder reference
return '/path/in/bundle/ggml-large-v3-turbo-encoder.mlmodelc';
}
return '';
}
3. Place CoreML Encoder Alongside GGML Model
/app/support/models/
├── ggml-large-v3-turbo-q3_k.bin
└── ggml-large-v3-turbo-encoder.mlmodelc/ ← Must be in same directory
├── model.mil
├── coremldata.bin
└── metadata.json
Naming Convention:
- GGML model:
ggml-{model-name}-{quantization}.bin - CoreML model:
ggml-{model-name}-encoder.mlmodelc/(base name must match)
Example pairs:
ggml-large-v3-turbo-q3_k.bin+ggml-large-v3-turbo-encoder.mlmodelc/ggml-base-q5_0.bin+ggml-base-encoder.mlmodelc/
4. Use Normally
final result = await controller.transcribe(
model: '/app/support/models/ggml-large-v3-turbo-q3_k.bin',
audioPath: audioPath,
lang: 'auto',
);
// whisper.cpp automatically detects and uses CoreML encoder if present
How Detection Works
When you load a GGML model (e.g., ggml-large-v3-turbo-q3_k.bin), whisper.cpp automatically:
- Strips quantization suffix:
ggml-large-v3-turbo-q3_k.bin→ggml-large-v3-turbo - Looks for
ggml-large-v3-turbo-encoder.mlmodelc/in the same directory - If found and valid: Uses CoreML (NPU) acceleration
- If not found: Falls back to Metal (GPU) acceleration
No code changes needed - detection is automatic!
Troubleshooting
CoreML model not detected:
[CoreML Debug] whisper_coreml_init called
[CoreML Error] Model file/directory does not exist at path: /path/to/model.mlmodelc
Common causes:
- Wrong path:
.mlmodelcmust be in same directory as.binfile - Not a directory:
.mlmodelcis a directory, not a file - check with file manager - Flutter assets: Cannot bundle via
pubspec.yamlassets - use runtime download or native bundle - Name mismatch: Base names must match (e.g.,
ggml-base-q5.binneedsggml-base-encoder.mlmodelc)
Check if CoreML is working:
[CoreML Debug] CoreML model loaded successfully!
If you see this log, CoreML (NPU) is active. Otherwise, Metal (GPU) is used.
Performance Comparison
| Acceleration | Device | Speed | Battery | Storage |
|---|---|---|---|---|
| CoreML (NPU) | Apple Silicon | 3-5x faster | Best | +1.2GB |
| Metal (GPU) | iOS/macOS | 2-3x faster | Good | - |
| CPU (SIMD) | Android | 1x (baseline) | Fair | - |
Recommendation:
- Large-v3-Turbo: Use CoreML if storage allows - significant speed + battery improvement
- Base/Small models: Metal is sufficient - CoreML overhead not worth it
- Android: CoreML not available - CPU SIMD only
Notes
- CoreML encoder works with all quantization variants (q3_k, q5_0, q8_0, etc.) of the same base model
- If
.mlmodelcis not present, Metal (GPU) acceleration is used automatically on iOS/macOS - CoreML requires ~1.2GB additional storage per model but provides 3x+ speedup and better battery life
- Android does not support CoreML - CPU optimization only
📄 License #
MIT License - Based on the original work by sk3llo/whisper_ggml.