meomeo

Text to speech for Dart. Text in, audio out.

Uses espeak-ng for phonemization and ONNX Runtime for neural inference. Supports multiple model formats: KittenTTS, Kokoro, and Piper.

Requirements

Rust toolchain (rustup) — dort compiles ONNX Runtime automatically
C compiler (Xcode on macOS, gcc on Linux) — espeak compiles espeak-ng automatically

Both compile once via Dart Native Assets on first dart run.

Setup

1. Add dependencies

dependencies:
  meomeo: ^0.3.2

dev_dependencies:
  espeak: ^0.1.3

2. Compile espeak phoneme data

dart run espeak:compile_data --all --exclude=fo --output ./espeak-data

This downloads espeak-ng source automatically and compiles phoneme data for 120 languages. fo (Faroese) is excluded because it's 5.4MB alone — add it back with --all if needed.

The compiled data is platform-independent. Only needs to run once.

3. Download a model

See the model-specific sections below for download instructions.

Engines

meomeo supports three TTS engines. Each has its own class, model format, and voice system.

KittenTTS

Multi-voice English model. Voices are bundled in a single .npz file.

Download a model:

Nano (15M params, fast — recommended to start):

curl -L -o model.onnx https://huggingface.co/KittenML/kitten-tts-nano-0.8/resolve/main/kitten_tts_nano_v0_8.onnx
curl -L -o voices.npz https://huggingface.co/KittenML/kitten-tts-nano-0.8/resolve/main/voices.npz

Mini (80M params, better quality):

curl -L -o model.onnx https://huggingface.co/KittenML/kitten-tts-mini-0.8/resolve/main/kitten_tts_mini_v0_8.onnx
curl -L -o voices.npz https://huggingface.co/KittenML/kitten-tts-mini-0.8/resolve/main/voices.npz

Usage:

import 'package:meomeo/meomeo.dart';

final meo = MeoKitten(
  model: 'model.onnx',
  voices: 'voices.npz',
  espeakData: './espeak-data',
);

final luna = Speaker(voice: 'Luna');
final pcm = await meo.speak('Hello world', speaker: luna);

saveWav('output.wav', pcm);
meo.dispose();

Voices: Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo

Kokoro

Multi-voice, multi-language model. Voices are individual .bin files in a directory.

final meo = MeoKokoro(
  model: 'kokoro.onnx',
  voices: './voices',
  espeakData: './espeak-data',
);

final speaker = Speaker(voice: 'af_heart');
final pcm = await meo.speak('Hello world', speaker: speaker);

saveWav('output.wav', pcm);
meo.dispose();

Piper

One ONNX model per voice. Supports 30+ languages with hundreds of community voices.

Each model comes with a config file (model.onnx.json).

// Single voice
final meo = MeoPiper(
  model: 'vi_VN-vais1000-medium.onnx',
  espeakData: './espeak-data',
);

// Or load all voices from a directory
final meo = MeoPiper.dir(
  path: './piper-voices',
  espeakData: './espeak-data',
);

final speaker = Speaker(voice: 'vi_VN-vais1000-medium');
final pcm = await meo.speak('Xin chào', speaker: speaker);

saveWav('output.wav', pcm);
meo.dispose();

Speaker

All engines use the Speaker class to configure voice and speed:

final speaker = Speaker(
  voice: 'Luna',       // must match a loaded voice
  speed: 1.2,          // speech speed multiplier (default: 1.0)
  phonemizer: custom,  // optional custom phonemizer (for language packs)
);

Word timing

Use synthesize() when you need audio metadata. Existing speak() calls still return PCM audio directly.

final result = await meo.synthesize(
  'Hello world',
  speaker: luna,
  timing: SpeechTiming.estimatedWords,
);

saveWav('output.wav', result.samples, sampleRate: result.sampleRate);

for (final mark in result.marks) {
  print(
    '${mark.text}: '
    '${mark.startSeconds(result.sampleRate)}s - '
    '${mark.endSeconds(result.sampleRate)}s',
  );
}

SpeechTiming.estimatedWords preserves source text spans, synthesizes each text chunk, then distributes the generated sample range across words by phoneme weight. This is designed for karaoke-style highlighting and subtitle cursors. It is not forced alignment, so exact phoneme or syllable boundaries are not guaranteed.

Language packs

For languages that need specialized phonemization (beyond espeak-ng), use a language pack:

meomeo_ja — Japanese

import 'package:meomeo_ja/meomeo_ja.dart';

final ja = JapanesePhonemizer.init(dictPath: '/path/to/ipadic');
final yuki = Speaker(voice: 'jf_alpha', phonemizer: ja);

final pcm = await meo.speak('こんにちは世界', speaker: yuki);
ja.dispose();