executorch_flutter 0.5.0-rc.3 copy "executorch_flutter: ^0.5.0-rc.3" to clipboard
executorch_flutter: ^0.5.0-rc.3 copied to clipboard

ExecuTorch on-device ML inference for Flutter using dart:ffi — vision models plus experimental streaming LLM (Gemma 4). Android, iOS, macOS, Linux, Windows, Web.

ExecuTorch Flutter #

A Flutter plugin for on-device ML inference using PyTorch ExecuTorch, supporting Android, iOS, macOS, Windows, Linux, and Web.

pub.dev | Live Demo | Example App


Table of Contents #


Overview #

ExecuTorch Flutter provides a simple Dart API for loading and running ExecuTorch models (.pte files) in your Flutter applications. The package handles all native platform integration, providing you with a straightforward interface for on-device machine learning inference.

Features #

  • Cross-Platform: Android (API 23+), iOS (13.0+), macOS (11.0+), Windows, Linux, and Web
  • Type-Safe API: dart:ffi bindings with type-safe Dart wrapper classes
  • Async Operations: Non-blocking model loading and inference
  • Multiple Models: Support for concurrent model instances
  • Error Handling: Structured exception handling with clear error messages
  • Backend Support: XNNPACK, CoreML, MPS, Vulkan backends
  • On-device LLM (experimental): streaming text generation with Gemma 4 (XNNPACK CPU + MLX Apple-GPU) — see docs/LLM.md
  • Live Camera: Real-time inference with camera stream support

Library Size by Backend #

📊 Download Release Size Comparison (SVG) | Download Debug Size Comparison (SVG) | JSON Report


Installation #

Requirements: Flutter 3.38+ (first version with native assets hooks)

dependencies:
  executorch_flutter: ^0.5.0-rc.3

Quick Start #

1. Load a Model #

import 'package:executorch_flutter/executorch_flutter.dart';

// Load from Flutter assets (recommended - works on all platforms)
final model = await ExecuTorchModel.loadFromAsset('assets/models/model.pte');

2. Run Inference #

final inputTensor = TensorData(
  shape: [1, 3, 224, 224],
  dataType: TensorType.float32,
  data: yourImageBytes,
);

final outputs = await model.forward([inputTensor]);

for (var output in outputs) {
  print('Shape: ${output.shape}, Type: ${output.dataType}');
}

3. Clean Up #

await model.dispose();

Model Loading Options #

Method Platforms Use Case
loadFromAsset(path) All (including web) Bundled assets
loadFromBytes(bytes) All (including web) Downloaded/cached models
load(filePath) Native only External file paths
// From bytes
final byteData = await rootBundle.load('assets/models/model.pte');
final model = await ExecuTorchModel.loadFromBytes(byteData.buffer.asUint8List());

// From file path (native platforms only)
final model = await ExecuTorchModel.load('/path/to/model.pte');

Platform Support #

Platform Min Version Architectures Backends
Android API 23 arm64-v8a, armeabi-v7a, x86_64, x86 XNNPACK, Vulkan*
iOS 13.0+ arm64, x86_64+arm64 (sim) XNNPACK, CoreML, MPS, Vulkan*
macOS 11.0+ arm64, x86_64 XNNPACK, CoreML, MPS, Vulkan*
Windows 10+ x64 XNNPACK, Vulkan*
Linux Ubuntu 20.04+ x64, arm64 XNNPACK, Vulkan*
Web Modern browsers WebAssembly XNNPACK (Wasm SIMD)

*Vulkan is opt-in and experimental. See Vulkan Backend.

Platform Configuration #

If you encounter deployment target errors, update your project settings:

iOS Deployment Target (iOS 13.0+)
  1. Open ios/Runner.xcworkspace in Xcode
  2. Select Runner target → Build Settings
  3. Search "iOS Deployment Target" → Set to 13.0
macOS Deployment Target (macOS 11.0+)
  1. Open macos/Runner.xcworkspace in Xcode
  2. Select Runner target → Build Settings
  3. Search "macOS Deployment Target" → Set to 11.0

After updating, run:

flutter clean && flutter pub get && flutter build <platform>

API Reference #

ExecuTorchModel #

// Load methods
static Future<ExecuTorchModel> loadFromAsset(String assetPath)
static Future<ExecuTorchModel> loadFromBytes(Uint8List modelBytes)
static Future<ExecuTorchModel> load(String filePath)  // Native only

// Inference
Future<List<TensorData>> forward(List<TensorData> inputs)

// Lifecycle
Future<void> dispose()
bool get isDisposed

ExecuTorchLLM (experimental) #

On-device generative text — Google Gemma 4 E2B — with token-by-token streaming, separate from the tensor API. Loaded from file paths (weights are 1+ GB) and driven by a stateful decode loop + tokenizer + KV cache. Backends: XNNPACK (CPU, all platforms) and MLX (Apple-Silicon GPU, macOS arm64).

// Load (file paths; mlxMetallibPath is MLX-only)
static Future<ExecuTorchLLM> load({
  required String modelPath,
  required String tokenizerPath,
  String? dataPath,
  String? mlxMetallibPath,
})

// Stream tokens as they decode
Stream<String> generate(String prompt, {GenConfig config})

// Control / lifecycle
void stop();              // cooperative cancel mid-generation
void reset();             // clear KV cache / start a new conversation
Future<void> dispose();   // release native resources

// GenConfig — temperature-only sampling (no top-p/top-k)
const GenConfig({int maxNewTokens, int seqLen, double temperature, bool echo, bool ignoreEos});
final llm = await ExecuTorchLLM.load(
  modelPath: '/path/gemma-4-E2B-it_xnnpack.pte',
  tokenizerPath: '/path/gemma-4-E2B-it_tokenizer.json',
);
// Gemma 4 needs its turn markers around the message:
final prompt = '<bos><|turn>user\nExplain Flutter in one line.<turn|>\n<|turn>model\n';
await for (final piece in llm.generate(prompt,
    config: const GenConfig(maxNewTokens: 512, temperature: 0))) {
  stdout.write(piece);
}
await llm.dispose();

Enable it in pubspec.yaml (hooks.user_defines.executorch_flutter):

llm: true
backends: [xnnpack, mlx]   # mlx is auto-dropped off macOS-arm64

📖 Full guide: docs/LLM.md — model export (the Gemma 4 scripts), the chat template, the MLX mlx.metallib shipping step, stopping, platform support, and troubleshooting. A complete streaming chat screen is in example/lib/screens/llm_chat_screen.dart.

TensorData #

final tensor = TensorData(
  shape: [1, 3, 224, 224],       // Dimensions
  dataType: TensorType.float32,  // float32, int32, int8, uint8
  data: Uint8List(...),          // Raw bytes
  name: 'input_0',               // Optional
);

BackendQuery #

Query available backends at runtime:

// Check specific backend
if (BackendQuery.isAvailable(Backend.coreml)) {
  model = await ExecuTorchModel.loadFromAsset('assets/model_coreml.pte');
} else {
  model = await ExecuTorchModel.loadFromAsset('assets/model_xnnpack.pte');
}

// List all available backends
final backends = BackendQuery.available;
print('Available: ${backends.map((b) => b.displayName).join(", ")}');
Backend Display Name Platforms
Backend.xnnpack XNNPACK All
Backend.coreml CoreML iOS, macOS
Backend.mps Metal Performance Shaders iOS, macOS
Backend.vulkan Vulkan Android, iOS, macOS, Windows, Linux

Exception Hierarchy #

ExecuTorchException (base)
├── ExecuTorchModelException      // Model loading/lifecycle
├── ExecuTorchInferenceException  // Inference execution
├── ExecuTorchValidationException // Tensor validation
├── ExecuTorchMemoryException     // Memory/resources
├── ExecuTorchIOException         // File I/O
└── ExecuTorchPlatformException   // Platform communication

Build Configuration #

Configure the native build in your app's pubspec.yaml:

hooks:
  user_defines:
    executorch_flutter:
      debug: false              # Enable debug logging
      build_mode: "prebuilt"    # "prebuilt", "local", or "source"
      # prebuilt_version: "1.1.0.7"  # Optional: pin specific native version
      # For source mode: build from local ExecuTorch checkout
      # build_mode: "source"
      # executorch_source: "/path/to/executorch"
      # For local mode: point at pre-compiled libraries
      # local_lib_dir: "/path/to/compiled/libs"
      backends:
        - xnnpack
        - coreml
        - mps

Options #

Option Default Description
debug false Debug logging + debug binaries
build_mode "prebuilt" "prebuilt" (fast), "local" (pre-compiled), or "source" (from source)
prebuilt_version Current Prebuilt release version
executorch_source - Path to local ExecuTorch checkout (source mode)
local_lib_dir - Path to pre-compiled libraries (local mode)
backends Platform-specific Backends to enable

Default Backends by Platform #

Platform Defaults
Android xnnpack
iOS xnnpack, coreml, mps
macOS xnnpack, coreml, mps
Windows/Linux xnnpack

Environment Variables #

Variable Description
EXECUTORCH_BUILD_MODE Override build mode (prebuilt, local, source)
EXECUTORCH_SOURCE_DIR Path to local ExecuTorch checkout (source mode)
EXECUTORCH_INSTALL_DIR Path to pre-compiled libraries (local mode)
EXECUTORCH_CACHE_DIR Custom cache directory for source builds
EXECUTORCH_DISABLE_DOWNLOAD Skip prebuilt download

Advanced Usage #

Preprocessing Strategies #

The example app demonstrates three preprocessing approaches:

Strategy Performance Platforms Dependencies
GPU Shader ~75ms (web), comparable to OpenCV (native) All None
OpenCV Very fast Native only opencv_dart
CPU (image lib) ~560ms (web), slower All image

GPU Preprocessing Tutorial - Step-by-step guide with GLSL shader examples.


Web Platform #

Web runs via WebAssembly with XNNPACK backend.

Performance #

Metric Native Web (Wasm)
YOLO11n Inference ~50-100ms ~622ms
Total E2E ~150-200ms ~855ms

When to use Web:

  • Demos and prototyping
  • Interactive inference (sub-second)
  • No app install required

Not recommended for:

  • Real-time camera inference
  • High-throughput batch processing

Setup #

  1. Run setup script:

    dart run executorch_flutter:setup_web
    
  2. Add to web/index.html:

    <head>
      <script src="js/executorch_wrapper.js"></script>
    </head>
    
  3. Use XNNPACK models (same as native).


Example Application #

The example/ directory includes:

  • Unified Model Playground - Multiple model types in one interface
  • MobileNet V3 - Image classification (1000 ImageNet classes)
  • YOLO - Object detection (v5, v8, v11)
  • Camera Mode - Real-time inference
  • Settings - Thresholds, preprocessing, performance overlay
cd example
flutter run -d macos  # or ios, android, windows, linux, chrome

Converting PyTorch Models to ExecuTorch #

Convert your PyTorch models to .pte format:

Official ExecuTorch Export Guide

Example app models are hosted at executorch_flutter_models and downloaded automatically.

To export manually:

cd models/python
python3 main.py

LLM (Gemma 4) models are exported with dedicated scripts (they need a tokenizer + quantization recipe, not the tensor export path):

python models/python/export_gemma4_xnnpack.py   # CPU model (all platforms)
python models/python/export_gemma4_mlx.py        # Apple-GPU model (macOS)

See docs/LLM.md for the full export recipe, the required tokenizer.json / mlx.metallib, and how to load them with ExecuTorchLLM.


Troubleshooting #

Model loading fails
  • Verify asset is listed in pubspec.yaml
  • Check model bytes: modelBytes.lengthInBytes > 0
  • Re-export with correct ExecuTorch version
Inference returns error
  • Check model.inputShapes / model.outputShapes
  • Verify tensor data types match expectations
  • Ensure shapes match exactly (including batch dimension)
Memory issues
  • Always call dispose() when done
  • Don't load too many models simultaneously

Experimental: Vulkan Backend #

Warning: Vulkan is experimental and opt-in.

Status #

Platform Status
Android Works on most devices; see #26 for PowerVR GPU status
Windows/Linux Generally functional
macOS/iOS Works via MoltenVK (Vulkan-to-Metal translation)

Enable Vulkan #

hooks:
  user_defines:
    executorch_flutter:
      backends:
        - xnnpack
        - vulkan

Vulkan Troubleshooting #

"uniform data allocation exceeded" on Android

This can occur when Vulkan tensor metadata exceeds the per-tensor uniform buffer limit. Fix submitted upstream: pytorch/executorch#17294.

Vulkan on PowerVR GPUs

Some PowerVR devices may produce incorrect Vulkan results due to texture dimension limits. Being tracked upstream: pytorch/executorch#17299. XNNPACK is recommended as a fallback.

Recommendations #

  • Production: Use XNNPACK (stable everywhere)
  • Apple platforms: Use CoreML or MPS instead of Vulkan
  • Testing: Report issues with device info and logs

Report Vulkan Issues


Contributing #

Contributions welcome! See CONTRIBUTING.md for guidelines.

Acknowledgments #

  • opencv_dart - Referenced for understanding Flutter native assets build patterns and cross-platform FFI packaging

License #

MIT License - see LICENSE.

Support #


Built with love for the Flutter and PyTorch communities.

11
likes
130
points
439
downloads

Documentation

API reference

Publisher

verified publisherzcreations.info

Weekly Downloads

ExecuTorch on-device ML inference for Flutter using dart:ffi — vision models plus experimental streaming LLM (Gemma 4). Android, iOS, macOS, Linux, Windows, Web.

Repository (GitHub)
View/report issues
Contributing

Topics

#machine-learning #pytorch #ai #ffi #mobile

License

MIT (license)

Dependencies

code_assets, ffi, flutter, flutter_web_plugins, hooks, image, logging, meta, native_toolchain_cmake, path_provider

More

Packages that depend on executorch_flutter

Packages that implement executorch_flutter