ExecuTorch Flutter #

A Flutter plugin for on-device ML inference using PyTorch ExecuTorch, supporting Android, iOS, macOS, Windows, Linux, and Web.

pub.dev | Live Demo | Example App

Table of Contents #

Overview
Features
Installation
Quick Start
Platform Support
API Reference
On-device LLM (Gemma 4)
Build Configuration
Advanced Usage
Web Platform
Example Application
Model Export
Troubleshooting
Vulkan Backend (Experimental)
Contributing
License

Overview #

ExecuTorch Flutter provides a simple Dart API for loading and running ExecuTorch models (.pte files) in your Flutter applications. The package handles all native platform integration, providing you with a straightforward interface for on-device machine learning inference.

Features #

Cross-Platform: Android (API 23+), iOS (13.0+), macOS (11.0+), Windows, Linux, and Web
Type-Safe API: dart:ffi bindings with type-safe Dart wrapper classes
Async Operations: Non-blocking model loading and inference
Multiple Models: Support for concurrent model instances
Error Handling: Structured exception handling with clear error messages
Backend Support: XNNPACK, CoreML, MPS, Vulkan backends
On-device LLM (experimental): streaming text generation with Gemma 4 (XNNPACK CPU + MLX Apple-GPU) — see docs/LLM.md
Live Camera: Real-time inference with camera stream support

Library Size by Backend #

📊 Download Release Size Comparison (SVG) | Download Debug Size Comparison (SVG) | JSON Report

Installation #

Requirements: Flutter 3.38+ (first version with native assets hooks)

dependencies:
  executorch_flutter: ^0.5.0-rc.3

Quick Start #

1. Load a Model #

import 'package:executorch_flutter/executorch_flutter.dart';

// Load from Flutter assets (recommended - works on all platforms)
final model = await ExecuTorchModel.loadFromAsset('assets/models/model.pte');

2. Run Inference #

final inputTensor = TensorData(
  shape: [1, 3, 224, 224],
  dataType: TensorType.float32,
  data: yourImageBytes,
);

final outputs = await model.forward([inputTensor]);

for (var output in outputs) {
  print('Shape: ${output.shape}, Type: ${output.dataType}');
}

3. Clean Up #

await model.dispose();

Model Loading Options #

Method	Platforms	Use Case
`loadFromAsset(path)`	All (including web)	Bundled assets
`loadFromBytes(bytes)`	All (including web)	Downloaded/cached models
`load(filePath)`	Native only	External file paths

// From bytes
final byteData = await rootBundle.load('assets/models/model.pte');
final model = await ExecuTorchModel.loadFromBytes(byteData.buffer.asUint8List());

// From file path (native platforms only)
final model = await ExecuTorchModel.load('/path/to/model.pte');

Platform Support #

Platform	Min Version	Architectures	Backends
Android	API 23	arm64-v8a, armeabi-v7a, x86_64, x86	XNNPACK, Vulkan*
iOS	13.0+	arm64, x86_64+arm64 (sim)	XNNPACK, CoreML, MPS, Vulkan*
macOS	11.0+	arm64, x86_64	XNNPACK, CoreML, MPS, Vulkan*
Windows	10+	x64	XNNPACK, Vulkan*
Linux	Ubuntu 20.04+	x64, arm64	XNNPACK, Vulkan*
Web	Modern browsers	WebAssembly	XNNPACK (Wasm SIMD)

*Vulkan is opt-in and experimental. See Vulkan Backend.

Platform Configuration #

If you encounter deployment target errors, update your project settings:

iOS Deployment Target (iOS 13.0+)

Open ios/Runner.xcworkspace in Xcode
Select Runner target → Build Settings
Search "iOS Deployment Target" → Set to 13.0

macOS Deployment Target (macOS 11.0+)

Open macos/Runner.xcworkspace in Xcode
Select Runner target → Build Settings
Search "macOS Deployment Target" → Set to 11.0

After updating, run:

flutter clean && flutter pub get && flutter build <platform>

API Reference #

ExecuTorchModel #

// Load methods
static Future<ExecuTorchModel> loadFromAsset(String assetPath)
static Future<ExecuTorchModel> loadFromBytes(Uint8List modelBytes)
static Future<ExecuTorchModel> load(String filePath)  // Native only

// Inference
Future<List<TensorData>> forward(List<TensorData> inputs)

// Lifecycle
Future<void> dispose()
bool get isDisposed

ExecuTorchLLM (experimental) #

On-device generative text — Google Gemma 4 E2B — with token-by-token streaming, separate from the tensor API. Loaded from file paths (weights are 1+ GB) and driven by a stateful decode loop + tokenizer + KV cache. Backends: XNNPACK (CPU, all platforms) and MLX (Apple-Silicon GPU, macOS arm64).

// Load (file paths; mlxMetallibPath is MLX-only)
static Future<ExecuTorchLLM> load({
  required String modelPath,
  required String tokenizerPath,
  String? dataPath,
  String? mlxMetallibPath,
})

// Stream tokens as they decode
Stream<String> generate(String prompt, {GenConfig config})

// Control / lifecycle
void stop();              // cooperative cancel mid-generation
void reset();             // clear KV cache / start a new conversation
Future<void> dispose();   // release native resources

// GenConfig — temperature-only sampling (no top-p/top-k)
const GenConfig({int maxNewTokens, int seqLen, double temperature, bool echo, bool ignoreEos});

final llm = await ExecuTorchLLM.load(
  modelPath: '/path/gemma-4-E2B-it_xnnpack.pte',
  tokenizerPath: '/path/gemma-4-E2B-it_tokenizer.json',
);
// Gemma 4 needs its turn markers around the message:
final prompt = '<bos><|turn>user\nExplain Flutter in one line.<turn|>\n<|turn>model\n';
await for (final piece in llm.generate(prompt,
    config: const GenConfig(maxNewTokens: 512, temperature: 0))) {
  stdout.write(piece);
}
await llm.dispose();

Enable it in pubspec.yaml (hooks.user_defines.executorch_flutter):

llm: true
backends: [xnnpack, mlx]   # mlx is auto-dropped off macOS-arm64

📖 Full guide: docs/LLM.md — model export (the Gemma 4 scripts), the chat template, the MLX mlx.metallib shipping step, stopping, platform support, and troubleshooting. A complete streaming chat screen is in example/lib/screens/llm_chat_screen.dart.

TensorData #

final tensor = TensorData(
  shape: [1, 3, 224, 224],       // Dimensions
  dataType: TensorType.float32,  // float32, int32, int8, uint8
  data: Uint8List(...),          // Raw bytes
  name: 'input_0',               // Optional
);

BackendQuery #

Query available backends at runtime:

// Check specific backend
if (BackendQuery.isAvailable(Backend.coreml)) {
  model = await ExecuTorchModel.loadFromAsset('assets/model_coreml.pte');
} else {
  model = await ExecuTorchModel.loadFromAsset('assets/model_xnnpack.pte');
}

// List all available backends
final backends = BackendQuery.available;
print('Available: ${backends.map((b) => b.displayName).join(", ")}');

Backend	Display Name	Platforms
`Backend.xnnpack`	XNNPACK	All
`Backend.coreml`	CoreML	iOS, macOS
`Backend.mps`	Metal Performance Shaders	iOS, macOS
`Backend.vulkan`	Vulkan	Android, iOS, macOS, Windows, Linux

Exception Hierarchy #

ExecuTorchException (base)
├── ExecuTorchModelException      // Model loading/lifecycle
├── ExecuTorchInferenceException  // Inference execution
├── ExecuTorchValidationException // Tensor validation
├── ExecuTorchMemoryException     // Memory/resources
├── ExecuTorchIOException         // File I/O
└── ExecuTorchPlatformException   // Platform communication

Build Configuration #

Configure the native build in your app's pubspec.yaml:

hooks:
  user_defines:
    executorch_flutter:
      debug: false              # Enable debug logging
      build_mode: "prebuilt"    # "prebuilt", "local", or "source"
      # prebuilt_version: "1.1.0.7"  # Optional: pin specific native version
      # For source mode: build from local ExecuTorch checkout
      # build_mode: "source"
      # executorch_source: "/path/to/executorch"
      # For local mode: point at pre-compiled libraries
      # local_lib_dir: "/path/to/compiled/libs"
      backends:
        - xnnpack
        - coreml
        - mps

Options #

Option	Default	Description
`debug`	`false`	Debug logging + debug binaries
`build_mode`	`"prebuilt"`	`"prebuilt"` (fast), `"local"` (pre-compiled), or `"source"` (from source)
`prebuilt_version`	Current	Prebuilt release version
`executorch_source`	-	Path to local ExecuTorch checkout (source mode)
`local_lib_dir`	-	Path to pre-compiled libraries (local mode)
`backends`	Platform-specific	Backends to enable

Default Backends by Platform #

Platform	Defaults
Android	xnnpack
iOS	xnnpack, coreml, mps
macOS	xnnpack, coreml, mps
Windows/Linux	xnnpack

Environment Variables #

Variable	Description
`EXECUTORCH_BUILD_MODE`	Override build mode (`prebuilt`, `local`, `source`)
`EXECUTORCH_SOURCE_DIR`	Path to local ExecuTorch checkout (source mode)
`EXECUTORCH_INSTALL_DIR`	Path to pre-compiled libraries (local mode)
`EXECUTORCH_CACHE_DIR`	Custom cache directory for source builds
`EXECUTORCH_DISABLE_DOWNLOAD`	Skip prebuilt download

Advanced Usage #

Preprocessing Strategies #

The example app demonstrates three preprocessing approaches:

Strategy	Performance	Platforms	Dependencies
GPU Shader	~75ms (web), comparable to OpenCV (native)	All	None
OpenCV	Very fast	Native only	opencv_dart
CPU (image lib)	~560ms (web), slower	All	image

GPU Preprocessing Tutorial - Step-by-step guide with GLSL shader examples.

Web Platform #

Web runs via WebAssembly with XNNPACK backend.

Performance #

Metric	Native	Web (Wasm)
YOLO11n Inference	~50-100ms	~622ms
Total E2E	~150-200ms	~855ms

When to use Web:

Demos and prototyping
Interactive inference (sub-second)
No app install required

Not recommended for:

Real-time camera inference
High-throughput batch processing

Setup #

Run setup script:
```
dart run executorch_flutter:setup_web
```

Add to web/index.html:

<head>
  <script src="js/executorch_wrapper.js"></script>
</head>

Use XNNPACK models (same as native).

Example Application #

The example/ directory includes:

Unified Model Playground - Multiple model types in one interface
MobileNet V3 - Image classification (1000 ImageNet classes)
YOLO - Object detection (v5, v8, v11)
Camera Mode - Real-time inference
Settings - Thresholds, preprocessing, performance overlay

cd example
flutter run -d macos  # or ios, android, windows, linux, chrome

Converting PyTorch Models to ExecuTorch #

Convert your PyTorch models to .pte format:

Official ExecuTorch Export Guide

Example app models are hosted at executorch_flutter_models and downloaded automatically.

To export manually:

cd models/python
python3 main.py

LLM (Gemma 4) models are exported with dedicated scripts (they need a tokenizer + quantization recipe, not the tensor export path):

python models/python/export_gemma4_xnnpack.py   # CPU model (all platforms)
python models/python/export_gemma4_mlx.py        # Apple-GPU model (macOS)

See docs/LLM.md for the full export recipe, the required tokenizer.json / mlx.metallib, and how to load them with ExecuTorchLLM.

Troubleshooting #

Model loading fails

Verify asset is listed in pubspec.yaml
Check model bytes: modelBytes.lengthInBytes > 0
Re-export with correct ExecuTorch version

Inference returns error

Check model.inputShapes / model.outputShapes
Verify tensor data types match expectations
Ensure shapes match exactly (including batch dimension)

Memory issues

Always call dispose() when done
Don't load too many models simultaneously

Experimental: Vulkan Backend #

Warning: Vulkan is experimental and opt-in.

Status #

Platform	Status
Android	Works on most devices; see #26 for PowerVR GPU status
Windows/Linux	Generally functional
macOS/iOS	Works via MoltenVK (Vulkan-to-Metal translation)

Enable Vulkan #

hooks:
  user_defines:
    executorch_flutter:
      backends:
        - xnnpack
        - vulkan

Vulkan Troubleshooting #

"uniform data allocation exceeded" on Android

This can occur when Vulkan tensor metadata exceeds the per-tensor uniform buffer limit. Fix submitted upstream: pytorch/executorch#17294.

Vulkan on PowerVR GPUs

Some PowerVR devices may produce incorrect Vulkan results due to texture dimension limits. Being tracked upstream: pytorch/executorch#17299. XNNPACK is recommended as a fallback.

Recommendations #

Production: Use XNNPACK (stable everywhere)
Apple platforms: Use CoreML or MPS instead of Vulkan
Testing: Report issues with device info and logs

Report Vulkan Issues

Contributing #

Contributions welcome! See CONTRIBUTING.md for guidelines.

Acknowledgments #

opencv_dart - Referenced for understanding Flutter native assets build patterns and cross-platform FFI packaging

License #

MIT License - see LICENSE.

Support #

Built with love for the Flutter and PyTorch communities.

executorch_flutter 0.5.0-rc.3 executorch_flutter: ^0.5.0-rc.3 copied to clipboard

Metadata