AI Edge #

A Flutter plugin for on-device AI inference powered by MediaPipe GenAI. Run large language models directly on mobile devices with optimized performance and privacy.

Features #

🚀 On-device inference - Run LLMs locally without internet connection
🔒 Privacy-first - All processing happens on the device, no data leaves the phone
⚡ Hardware acceleration - Supports GPU acceleration for faster inference
🌊 Streaming responses - Real-time text generation with partial results
🖼️ Multi-modal support - Process both text and images (vision-language models)
💬 Session management - Maintain conversation context across multiple queries
🎯 Flexible configuration - Customize temperature, top-k, top-p, and other parameters

Installation #

flutter pub add ai_edge

Or add it manually to your pubspec.yaml:

dependencies:
  ai_edge:

Getting Started #

1. Basic Setup #

import 'package:ai_edge/ai_edge.dart';

// Initialize the AI model
final aiEdge = AiEdge.instance;
await aiEdge.initialize(
  modelPath: '/path/to/your/model.task',
  maxTokens: 512,
);

// Generate a response
final response = await aiEdge.generateResponse('What is Flutter?');
print(response);

// Clean up when done
await aiEdge.close();

2. Download and Setup Model #

This plugin requires a MediaPipe Task format model (.task file). You can:

Download pre-converted .task models from MediaPipe Model Gallery
Convert your own models to .task format using MediaPipe Model Maker
Download ready-to-use .task models from Hugging Face (example: gemma-3n-E2B-it-litert-preview)

Place the model file in your app's documents directory or assets.

Usage #

Basic Text Generation #

// Simple query-response
final response = await aiEdge.generateResponse(
  'Explain quantum computing in simple terms'
);

Streaming Responses #

// Get real-time partial results as the model generates text
final stream = aiEdge.generateResponseAsync('Write a story about a robot');

await for (final event in stream) {
  print('Partial: ${event.partialResult}');
  
  if (event.done) {
    print('Generation completed!');
  }
}

Multi-turn Conversations #

// Build context for conversations
await aiEdge.addQueryChunk('You are a helpful assistant.');
await aiEdge.addQueryChunk('Previous context: User asked about Flutter');

final response = await aiEdge.generateResponse(
  'What are its main advantages?'
);

import 'dart:io';

// Add image to the session
final imageBytes = await File('path/to/image.jpg').readAsBytes();
await aiEdge.addImage(imageBytes);

// Ask about the image
final response = await aiEdge.generateResponse(
  'What objects do you see in this image?'
);

Advanced Configuration #

// Initialize with custom settings
await aiEdge.initialize(
  modelPath: modelPath,
  maxTokens: 1024,
  preferredBackend: PreferredBackend.gpu,  // Android only, ignored on iOS
  sessionConfig: SessionConfig(
    temperature: 0.7,  // Control randomness (0.0-1.0)
    topK: 40,         // Limit vocabulary size
    topP: 0.95,       // Nucleus sampling threshold
    randomSeed: 42,   // For reproducible outputs
  ),
);

Platform Setup #

iOS Requirements #

iOS 15.0 or later

Add to your Info.plist if loading models from network:

<key>NSAppTransportSecurity</key>
<dict>
  <key>NSAllowsArbitraryLoads</key>
  <true/>
</dict>

Android Requirements #

Minimum SDK: Android API level 24 (Android 7.0) or later
- This is a requirement from MediaPipe GenAI SDK
- Flutter's default minSdkVersion is 21, so you must update it

Add to your android/app/build.gradle:

android {
  defaultConfig {
      minSdkVersion 24  // Required by MediaPipe GenAI
  }
}

Recommended Devices:
- Optimal performance on Pixel 7 or newer
- Other high-end Android devices with comparable specs
For large models, you may need to increase heap size in android/app/src/main/AndroidManifest.xml:
```
<application
  android:largeHeap="true"
  ...>
```

Model Preparation #

Model Format #

This plugin uses MediaPipe Task format (.task files) which are optimized for mobile inference. Any LLM model converted to .task format can be used with this plugin.

Tested models include:

Gemma models (2B, 3B variants)
Hammer models
Other models converted to MediaPipe Task format

The key requirement is that the model must be in .task format. Models in other formats (GGUF, SafeTensors, etc.) need to be converted first.

Model Storage #

Option 1: Download at runtime (Recommended for large models)

import 'package:path_provider/path_provider.dart';

// Download model to app's documents directory
final documentsDir = await getApplicationDocumentsDirectory();
final modelPath = '${documentsDir.path}/models/gemma.task';

// Use the modelPath with AiEdge
await aiEdge.initialize(modelPath: modelPath, maxTokens: 512);

Option 2: Bundle with app (For smaller models)

// First, add to pubspec.yaml:
// flutter:
//   assets:
//     - assets/models/

// Then use the asset path directly
await aiEdge.initialize(
  modelPath: 'assets/models/model.task',
  maxTokens: 512,
);

Model Size Considerations:

Models < 100MB: Can be bundled as assets
Models > 100MB: Download on first launch to save app size
Use the example app's download manager for reference implementation

API Reference #

Main Classes #

`AiEdge`

The main entry point for the plugin. Provides methods for model initialization and text generation.

`ModelConfig`

Configuration for model initialization:

modelPath: Path to the .task model file
maxTokens: Maximum tokens to generate
preferredBackend: CPU or GPU acceleration (Android only, ignored on iOS)
supportedLoraRanks: LoRA adapter support
maxNumImages: Maximum images for multi-modal input

`SessionConfig`

Configuration for generation sessions:

temperature: Controls randomness (0.0-1.0)
topK: Top-K sampling parameter
topP: Top-P (nucleus) sampling parameter
randomSeed: Seed for reproducible generation

`GenerationEvent`

Event emitted during streaming generation:

partialResult: The generated text so far
done: Whether generation is complete

Example App #

Check out the examples/ai_chat directory for a complete chat application demonstrating:

Model download and management
Real-time streaming responses
Conversation history
Error handling
UI best practices

Run the example:

cd examples/ai_chat
flutter run

Development #

Running Tests #

Setting up Hugging Face Token

Some integration tests download models from Hugging Face and require authentication:

# Set your Hugging Face token as an environment variable
export HF_TOKEN="your_hugging_face_token_here"

# Or pass it directly when running tests
HF_TOKEN="your_token" ./scripts/test_local_ios.sh

To get a Hugging Face token:

Create an account at huggingface.co
Go to Settings → Access Tokens
Create a new token with read permissions

Running Test Scripts

Quick test commands:

# iOS (macOS only)
./scripts/test_local_ios.sh

# Android
./scripts/test_local_android.sh

Troubleshooting #

Common Issues #

Model loading fails:

Ensure the model file exists at the specified path
Check file permissions
Verify the model is in MediaPipe Task format

Out of memory errors:

Use smaller models or reduce maxTokens
Enable largeHeap on Android
Consider using quantized models

Slow inference:

Enable GPU acceleration with PreferredBackend.gpu
Use smaller models for faster response
Reduce maxTokens for shorter outputs

License #

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Acknowledgments #

This plugin is built on top of MediaPipe GenAI by Google, providing optimized on-device inference for mobile platforms.

ai_edge 0.2.0 ai_edge: ^0.2.0 copied to clipboard

Metadata