dart_mlx_ffi
dart_mlx_ffi is a Dart and Flutter FFI package for Apple's
MLX C API.
The package vendors mlx, mlx-c, and the native build pieces needed to
compile a local MLX dynamic library for the current Apple target.
Highlights
- Stable high-level Dart API for arrays, tensor ops, scans, linalg, FFT, quantization, convolutions, streams, runtime helpers, export/import, and custom fast-kernel wrappers
- Full raw binding surface through
package:dart_mlx_ffi/raw.dart - Native build hooks for Apple MLX on
iOSandmacOS - Canonical MLX snapshot preparation through the repository's Unsloth MLX wrapper
- Verified parity against Python MLX on deterministic operator suites
- Publish-time parity coverage for text, VLM, TTS, and ASR checkpoints
Platform
iOSmacOS
This package targets Apple platforms only.
MLX is most useful on Apple Silicon with Metal available. If the local Xcode
installation does not contain the MetalToolchain component, the build hook
falls back to CPU-only MLX so the package still compiles.
To install the Metal shader toolchain on the build machine:
xcodebuild -downloadComponent MetalToolchain
Installation
dart pub add dart_mlx_ffi
Public Entry Points
package:dart_mlx_ffi/dart_mlx_ffi.dart: stable MLX tensor/runtime APIpackage:dart_mlx_ffi/models.dart: stable Dart model runners shipped by this repositorypackage:dart_mlx_ffi/raw.dart: generated low-levelmlx-cbindings
Quick Start
import 'package:dart_mlx_ffi/dart_mlx_ffi.dart';
final a = MlxArray.fromFloat32List([1, 2, 3, 4], shape: [2, 2]);
final b = MlxArray.fromFloat32List([5, 6, 7, 8], shape: [2, 2]);
final c = mx.matmul(a, b);
final s = c.sum();
print(MlxVersion.current());
print(MlxDevice.defaultDevice());
print(c.toList());
print(s.toList());
s.close();
c.close();
b.close();
a.close();
MLX Snapshot Workflow
This repository uses a canonical MLX conversion wrapper:
Use it when you want to:
- prepare a local MLX snapshot from a Hugging Face checkpoint
- standardize publish-time benchmark inputs
- keep local evaluation reproducible across machines
That wrapper produces MLX snapshots that can be used directly by:
- Dart model runners under
lib/src/models/ - export/import tooling under
models/text_lm/ - publish-time parity scripts under
benchmark/
For Gemma 4, the current publish-time text coverage uses the official
Unsloth MLX snapshot unsloth/gemma-4-E2B-it-UD-MLX-4bit directly instead of
re-quantizing locally, because Unsloth currently ships gemma4 model patches
for mlx-lm as a separate install step.
Exporting Text Model Bundles
The repository includes a Python helper for turning an mlx-lm snapshot into a
shapeless .mlxfn artifact plus matching sample inputs:
Example:
uv sync
uv run python models/text_lm/export_bundle.py \
--snapshot-dir /path/to/mlx-snapshot \
--output-dir /path/to/out-bundle
Outputs:
/path/to/out-bundle/function.mlxfn/path/to/out-bundle/inputs.safetensors
The export is shapeless, so the imported function accepts variable-length
input_ids tensors.
The generic Dart runner for exported artifacts is:
dart run models/common/import_run.dart \
/path/to/out-bundle/function.mlxfn \
/path/to/out-bundle/inputs.safetensors
Model Workflows
There are three main model-workflow areas in this repository:
lib/src/models/contains the main stable Dart model implementationsmodels/contains reusable non-runtime export and artifact toolingbenchmark/contains publish-time parity runners and report generation
Current stable Dart model implementations under lib/src/models/
include:
parakeet_tdtqwen2_5qwen3_5kitten_ttssharedhelpers
Current publish-time validation under benchmark/ is organized
as a release matrix instead of a grab bag of local experiments.
Recommended prepublish text coverage:
unsloth/gemma-4-E2B-it-UD-MLX-4bitmlx-community/Qwen3.5-27B-4bit-DWQmlx-community/translategemma-27b-it-4bitmlx-community/NVIDIA-Nemotron-3-Nano-30B-A3B-4bitmlx-community/GLM-4.7-Flash-4bit
Recommended prepublish multimodal / speech coverage:
mlx-community/MiniCPM-o-4_5-4bitmlx-community/Gemma-SEA-LION-v4-4B-VL-mlx-3bitmlx-community/Ming-omni-tts-0.5B-4bitmlx-community/kitten-tts-nano-0.8-6bitmlx-community/parakeet-tdt-0.6b-v3
Validation
Deterministic operator parity currently covers 114 checks across arithmetic,
tensor ops, scans, convolutions, linalg, fast ops, quantization, and random
APIs, with 0 failures on the benchmark machine.
Benchmark Environment
- Date:
2026-04-04 - Machine:
MacBook Pro (Mac16,5) - Chip:
Apple M4 Max - CPU cores:
16(12performance +4efficiency) - Memory:
128 GB - OS:
macOS 26.4 (25E5223i) - Kernel:
Darwin 25.4.0 - Dart SDK:
3.11.1 - Python:
3.12viauv - MLX runtime:
0.31.1
Latest Runtime Snapshot
Latest measured runtime snapshot on the benchmark machine, refreshed on
2026-04-04:
Text models:
| Model | Python MLX ms | Dart MLX ms | Max abs diff |
|---|---|---|---|
gemma-4-E2B-it-UD-MLX-4bit |
30.47 |
34.30 |
0 |
Qwen3.5-27B-4bit-DWQ |
172.81 |
170.25 |
0 |
translategemma-27b-it-4bit |
166.52 |
170.46 |
0 |
NVIDIA-Nemotron-3-Nano-30B-A3B-4bit |
36.62 |
35.67 |
0 |
GLM-4.7-Flash-4bit |
46.61 |
45.81 |
0 |
Non-text models:
| Model | Kind | Python MLX ms | Dart MLX ms | Max abs diff | Notes |
|---|---|---|---|---|---|
MiniCPM-o-4_5-4bit |
vlm |
130.82 |
131.58 |
0 |
synthetic image + prompt |
Gemma-SEA-LION-v4-4B-VL-mlx-3bit |
vlm |
718.60 |
756.92 |
0 |
synthetic image + prompt |
Ming-omni-tts-0.5B-4bit |
tts |
4.59 |
4.85 |
0 |
deterministic forward_with_cfg |
kitten-tts-nano-0.8-6bit |
tts |
66.25 |
69.20 |
1.19e-07 |
full waveform |
parakeet-tdt-0.6b-v3 |
asr |
30.95 |
29.72 |
5.72e-06 |
transcript matched |
What Max abs diff Means
Max abs diff is the maximum absolute difference between the Python MLX output
and the Dart MLX output for the compared tensor.
Examples:
0means the compared tensor matched exactly at the chosen dtype7.62939453125e-06means the worst element differed by about0.00000763- for text and VLM rows, the compared tensor is the final-token
logits[:16] - for
parakeet-tdt-0.6b-v3, the compared tensor is the first-steptoken_logits[:16] + duration_logits - for
Ming-omni-tts-0.5B-4bit, the compared tensor is the deterministicforward_with_cfgsubgraph output - for
kitten-tts-nano-0.8-6bit, the compared tensor is the full waveform
Reproduce The Release Matrix Report
Generate the publish-time report with warmup=3 and iters=10:
uv sync
HF_HUB_DISABLE_XET=1 uv run --no-project --with mlx-lm --with pillow --with mlx-vlm --with mlx-audio --with parakeet-mlx python benchmark/publish_report.py
The aggregated results are written to:
benchmark/out/publish_report.json
Useful focused runs:
# full-waveform KittenTTS comparison
uv run python benchmark/kitten_tts/mlx_audio_compare.py --warmup 3 --iters 10
# fixed-mel Parakeet TDT comparison
uv run --no-project --with parakeet-mlx --with numpy python - <<'PY'
from benchmark.parakeet_tdt_sweep import asr_bench
import json
print(json.dumps(asr_bench('mlx-community/parakeet-tdt-0.6b-v3', warmup=1, iters=1), indent=2))
PY
Development
Regenerate the raw bindings:
dart run ffigen --config ffigen.yaml
Typical local verification:
dart analyze
dart test
dart pub publish --dry-run
Benchmark tooling uses uv:
uv sync
Notes
- This package targets Apple platforms only.
- The raw layer remains the escape hatch for the full MLX C surface.
Libraries
- dart_mlx_ffi
- Stable high-level MLX API for Dart and Flutter on Apple platforms.
- dart_mlx_ffi_bindings_generated
- models
- raw
- Complete low-level
mlx-cFFI bindings generated from vendored headers.