ai_edge 0.0.1
ai_edge: ^0.0.1 copied to clipboard
Flutter plugin for on-device AI inference powered by MediaPipe GenAI. Run large language models directly on iOS and Android with optimized performance.
AI Edge #
A Flutter plugin for on-device AI inference powered by MediaPipe GenAI. Run large language models directly on mobile devices with optimized performance and privacy.
Features #
- 🚀 On-device inference - Run LLMs locally without internet connection
- 🔒 Privacy-first - All processing happens on the device, no data leaves the phone
- ⚡ Hardware acceleration - Supports GPU acceleration for faster inference
- 🌊 Streaming responses - Real-time text generation with partial results
- 🖼️ Multi-modal support - Process both text and images (vision-language models)
- 💬 Session management - Maintain conversation context across multiple queries
- 🎯 Flexible configuration - Customize temperature, top-k, top-p, and other parameters
Installation #
flutter pub add ai_edge
Or add it manually to your pubspec.yaml
:
dependencies:
ai_edge:
Getting Started #
1. Basic Setup #
import 'package:ai_edge/ai_edge.dart';
// Initialize the AI model
final aiEdge = AiEdge.instance;
await aiEdge.initialize(
modelPath: '/path/to/your/model.task',
maxTokens: 512,
);
// Generate a response
final response = await aiEdge.generateResponse('What is Flutter?');
print(response);
// Clean up when done
await aiEdge.close();
2. Download and Setup Model #
This plugin requires a MediaPipe Task format model (.task
file). You can:
- Download pre-converted
.task
models from MediaPipe Model Gallery - Convert your own models to
.task
format using MediaPipe Model Maker - Download ready-to-use
.task
models from Hugging Face (example: gemma-3n-E2B-it-litert-preview)
Place the model file in your app's documents directory or assets.
Usage #
Basic Text Generation #
// Simple query-response
final response = await aiEdge.generateResponse(
'Explain quantum computing in simple terms'
);
Streaming Responses #
// Get real-time partial results as the model generates text
final stream = aiEdge.generateResponseAsync('Write a story about a robot');
await for (final event in stream) {
print('Partial: ${event.partialResult}');
if (event.done) {
print('Generation completed!');
}
}
Multi-turn Conversations #
// Build context for conversations
await aiEdge.addQueryChunk('You are a helpful assistant.');
await aiEdge.addQueryChunk('Previous context: User asked about Flutter');
final response = await aiEdge.generateResponse(
'What are its main advantages?'
);
Multi-modal Input (Text + Image) #
import 'dart:io';
// Add image to the session
final imageBytes = await File('path/to/image.jpg').readAsBytes();
await aiEdge.addImage(imageBytes);
// Ask about the image
final response = await aiEdge.generateResponse(
'What objects do you see in this image?'
);
Advanced Configuration #
// Initialize with custom settings
await aiEdge.initialize(
modelPath: modelPath,
maxTokens: 1024,
preferredBackend: PreferredBackend.gpu, // Android only, ignored on iOS
sessionConfig: SessionConfig(
temperature: 0.7, // Control randomness (0.0-1.0)
topK: 40, // Limit vocabulary size
topP: 0.95, // Nucleus sampling threshold
randomSeed: 42, // For reproducible outputs
),
);
Platform Setup #
iOS Requirements #
- iOS 15.0 or later
- Add to your
Info.plist
if loading models from network:<key>NSAppTransportSecurity</key> <dict> <key>NSAllowsArbitraryLoads</key> <true/> </dict>
Android Requirements #
-
Minimum SDK: Android API level 24 (Android 7.0) or later
- This is a requirement from MediaPipe GenAI SDK
- Flutter's default minSdkVersion is 21, so you must update it
-
Add to your
android/app/build.gradle
:android { defaultConfig { minSdkVersion 24 // Required by MediaPipe GenAI } }
-
Recommended Devices:
- Optimal performance on Pixel 7 or newer
- Other high-end Android devices with comparable specs
-
For large models, you may need to increase heap size in
android/app/src/main/AndroidManifest.xml
:<application android:largeHeap="true" ...>
Model Preparation #
Model Format #
This plugin uses MediaPipe Task format (.task
files) which are optimized for mobile inference. Any LLM model converted to .task
format can be used with this plugin.
Tested models include:
- Gemma models (2B, 3B variants)
- Hammer models
- Other models converted to MediaPipe Task format
The key requirement is that the model must be in .task
format. Models in other formats (GGUF, SafeTensors, etc.) need to be converted first.
Model Storage #
Option 1: Download at runtime (Recommended for large models)
import 'package:path_provider/path_provider.dart';
// Download model to app's documents directory
final documentsDir = await getApplicationDocumentsDirectory();
final modelPath = '${documentsDir.path}/models/gemma.task';
// Use the modelPath with AiEdge
await aiEdge.initialize(modelPath: modelPath, maxTokens: 512);
Option 2: Bundle with app (For smaller models)
// First, add to pubspec.yaml:
// flutter:
// assets:
// - assets/models/
// Then use the asset path directly
await aiEdge.initialize(
modelPath: 'assets/models/model.task',
maxTokens: 512,
);
Model Size Considerations:
- Models < 100MB: Can be bundled as assets
- Models > 100MB: Download on first launch to save app size
- Use the example app's download manager for reference implementation
API Reference #
Main Classes #
AiEdge
The main entry point for the plugin. Provides methods for model initialization and text generation.
ModelConfig
Configuration for model initialization:
modelPath
: Path to the .task model filemaxTokens
: Maximum tokens to generatepreferredBackend
: CPU or GPU acceleration (Android only, ignored on iOS)supportedLoraRanks
: LoRA adapter supportmaxNumImages
: Maximum images for multi-modal input
SessionConfig
Configuration for generation sessions:
temperature
: Controls randomness (0.0-1.0)topK
: Top-K sampling parametertopP
: Top-P (nucleus) sampling parameterrandomSeed
: Seed for reproducible generation
GenerationEvent
Event emitted during streaming generation:
partialResult
: The generated text so fardone
: Whether generation is complete
Example App #
Check out the example directory for a complete chat application demonstrating:
- Model download and management
- Real-time streaming responses
- Conversation history
- Error handling
- UI best practices
Run the example:
cd example
flutter run
Development #
Running Tests #
Setting up Hugging Face Token
Some integration tests download models from Hugging Face and require authentication:
# Set your Hugging Face token as an environment variable
export HF_TOKEN="your_hugging_face_token_here"
# Or pass it directly when running tests
HF_TOKEN="your_token" ./scripts/test_local_ios.sh
To get a Hugging Face token:
- Create an account at huggingface.co
- Go to Settings → Access Tokens
- Create a new token with read permissions
Running Test Scripts
Quick test commands:
# iOS (macOS only)
./scripts/test_local_ios.sh
# Android
./scripts/test_local_android.sh
Troubleshooting #
Common Issues #
Model loading fails:
- Ensure the model file exists at the specified path
- Check file permissions
- Verify the model is in MediaPipe Task format
Out of memory errors:
- Use smaller models or reduce
maxTokens
- Enable
largeHeap
on Android - Consider using quantized models
Slow inference:
- Enable GPU acceleration with
PreferredBackend.gpu
- Use smaller models for faster response
- Reduce
maxTokens
for shorter outputs
License #
This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.
Acknowledgments #
This plugin is built on top of MediaPipe GenAI by Google, providing optimized on-device inference for mobile platforms.