Whisper ONNX STT

A Flutter plugin to perform completely offline Speech-to-Text inference using Whisper models exported to the ONNX format, running on the highly optimized C/C++ sherpa-onnx framework. It supports dynamic extraction of audio from any media (MP4, MP3, M4A, etc.) through FFmpeg natively.

Features

  • Fully Offline: Your audio data never leaves the device.
  • Native C/C++ Core: Inference loop runs directly on ONNXRuntime for top speeds, omitting Dart overhead.
  • Built-in Media Converter: Pass any Video or Audio path, and its audio will be silently resampled to 16kHz mono via FFmpeg before inference.
  • Dynamic Download: Supports keeping your app bundle small and downloading 300MB+ large ONNX int8-quantized artifacts gracefully on-the-fly.

Prerequisites

  1. Sherpa-ONNX Metadata Models: You MUST have Whisper models converted utilizing the official Sherpa ONNX Python export script. It will embed necessary metadata (n_mels, model_type, etc.) into the .onnx files! DO NOT use bare optimum or whisper.cpp blobs without the sherpa export.

(Note: Downloadable pre-exported model links are coming soon!)

If you need to export yours manually via python:

# In your sherpa-onnx directory
python3 scripts/whisper/export-onnx.py --model base
# Then upload the .onnx files to your server.
  1. tokens.txt: Place your corresponding tokens.txt securely inside the assets/ folder of this plugin!

Usage

Initialization & Model Download:

import 'package:whisper_onnx_stt/whisper_onnx_stt.dart';

final whisperPlugin = WhisperOnnxStt();

// Pass your remote Server URL where the `.onnx` models are currently hosted.
// The plugin expects to find `$baseUrl/base-encoder.int8.onnx` and `$baseUrl/base-decoder.int8.onnx`
await whisperPlugin.ensureModelDownloaded(
  'https://your-server.com/models/whisper/', 
  onProgress: (fileName, progress) {
    print('Downloading $fileName... ${(progress * 100).toInt()}%');
  },
);

Transcribing any media format:

try {
  final transcribedText = await whisperPlugin.transcribeMedia(
    '/path/to/my/video.mp4', 
    language: 'it', // Force transcription language 
  );
  print(transcribedText);
} catch (e) {
  print('Error transcribing: $e');
}

Check the example/ path for a complete implementation.

Libraries

whisper_onnx_stt