flutter_mind 0.2.0
flutter_mind: ^0.2.0 copied to clipboard
A Flutter AI package supporting cloud (Gemini) and offline local models — clean API, streaming, smart defaults, and built-in prompt engineering.
flutter_mind
Any AI. One interface.
Why flutter_mind? #
Most AI packages for Flutter just wrap the API — you still have to write the prompts, handle errors, manage tokens, and figure out streaming yourself.
flutter_mind does more:
- 🔌 One API for all providers — same interface for cloud and local models
- 💬 Multi-turn chat — conversation history with automatic token trimming
- ⚡ Streaming — typing-effect UI out of the box
- 🧠 Thinking models — built-in support for reasoning budgets
- 🛡️ Safe by default — input validation, retry logic, and clear error messages
- 🎯 Zero Firebase required — API key for cloud, or fully offline with no key at all
Supported Providers #
| Provider | Status | Models |
|---|---|---|
| Google Gemini | ✅ v1 | Flash 2.5, Pro 2.5, Flash-Lite, and more |
| Local Model (offline) | ✅ v1 | Any .gguf — Qwen, Llama, Gemma, Phi, Mistral, and more |
| OpenAI | 🔜 v2 | GPT-4o, GPT-4o Mini |
| Anthropic Claude | 🔜 v2 | Sonnet, Opus, Haiku |
| Grok | 🔜 v2 | — |
| DeepSeek | 🔜 v2 | — |
Installation #
dependencies:
flutter_mind: ^0.1.0
flutter pub get
Quick Start #
import 'package:flutter_mind/flutter_mind.dart';
void main() {
FlutterMind.init(
engine: GeminiEngine(apiKey: 'YOUR_GEMINI_API_KEY'),
);
runApp(MyApp());
}
// Anywhere in your app — no imports, no passing around
final response = await FlutterMind.send(userMessage: 'suggest a game');
print(response.text);
Three lines in main(). Done.
Getting Your API Key #
Google Gemini — Free tier available #
- Go to aistudio.google.com/apikey
- Sign in with your Google account
- Click Create API Key — no credit card required
OpenAI (coming in v2) #
- Go to platform.openai.com → API Keys → Create new secret key
Anthropic Claude (coming in v2) #
- Go to console.anthropic.com → API Keys → Create Key
Local Model — No API key needed ✅ #
No account, no key, no internet required. Just a .gguf model file on the device.
See the Local Model (Offline) section for full setup.
Usage #
Send a message #
final response = await FlutterMind.send(userMessage: 'what is Flutter?');
print(response.text); // the response text
print(response.totalTokens); // total tokens used
print(response.inputTokens); // tokens in your message
print(response.outputTokens); // tokens in the response
Streaming — typing effect UI #
FlutterMind.stream(userMessage: 'tell me a story').listen((chunk) {
setState(() => text += chunk); // text appears word by word
});
Multi-turn chat — conversation with memory #
final history = <ChatMessage>[];
// First turn
final r1 = await FlutterMind.send(
userMessage: 'my name is Osama',
history: history,
);
history.add(ChatMessage.user('my name is Osama'));
history.add(ChatMessage.model(r1.text));
// Second turn — model remembers the name
final r2 = await FlutterMind.send(
userMessage: 'what is my name?',
history: history,
maxHistoryMessages: 20, // oldest turns are dropped automatically
);
print(r2.text); // "Your name is Osama"
Engine configuration #
Set your defaults once — every call uses them automatically:
FlutterMind.init(
engine: GeminiEngine(
apiKey: 'YOUR_KEY',
config: GeminiConfig(
model: GeminiModel.flash25,
systemPrompt: Prompt(role: 'game suggestion assistant'),
temperature: 0.8,
maxOutputTokens: 500,
),
),
);
Prompt engineering #
Control how the model behaves with the Prompt class — from one field to full expert config.
Tier 1 — Minimal
GeminiConfig(
systemPrompt: Prompt(role: 'game suggestion assistant'),
)
Tier 2 — Standard
Prompt(
role: 'game assistant',
format: ResponseFormat.numberedList,
maxItems: 3,
language: ResponseLanguage.auto, // detects Arabic vs English per message
constraints: ['mobile only', 'no violent games'],
)
Tier 3 — Advanced
Prompt(
role: 'mobile game expert for Egyptian users',
goal: 'suggest games that match the user mood and age',
constraints: ['mobile only', 'no violent games', 'available in Egypt'],
format: ResponseFormat.numberedList,
maxItems: 3,
language: ResponseLanguage.auto,
tone: ResponseTone.friendly,
audience: 'Egyptian teenagers',
examples: [
PromptExample(input: 'fun game', output: 'Hollow Knight — platformer'),
PromptExample(input: 'relaxing', output: 'Stardew Valley — farming sim'),
],
)
Tier 4 — Expert
Prompt(
role: 'game assistant',
chainOfThought: true,
chainSteps: ['identify user mood', 'match game genre', 'select 3 games'],
preventInjection: true, // resists jailbreak attempts
responseAnchor: 'Here are your top 3 games:',
negativePatterns: ['never suggest PC games'],
compressed: false, // verbose output for complex reasoning
)
Ready-made presets
// Use directly
GeminiConfig(systemPrompt: AiPreset.chat)
GeminiConfig(systemPrompt: AiPreset.summarizer)
GeminiConfig(systemPrompt: AiPreset.codeHelper)
GeminiConfig(systemPrompt: AiPreset.stepByStep)
// Customize one field
GeminiConfig(
systemPrompt: AiPreset.chat.copyWith(role: 'Egyptian culture guide'),
)
Stop sequences — pair with the prompt
final prompt = Prompt(
format: ResponseFormat.numberedList,
maxItems: 3,
);
GeminiConfig(
systemPrompt: prompt,
stopSequences: prompt.stopSequences, // → ['[END]'] — model stops exactly here
)
Per-call config override #
Override only what changes for a single call — defaults stay untouched:
// Uses your default config
await FlutterMind.send(userMessage: 'suggest a game');
// Overrides just for this one call
await FlutterMind.send(
userMessage: 'solve this complex math problem',
config: GeminiConfig(
model: GeminiModel.pro25,
temperature: 0.1,
thinkingLevel: ThinkingLevel.deep,
),
);
Thinking models #
Let the model reason before answering — better results on hard problems:
GeminiConfig(
model: GeminiModel.pro25,
thinkingLevel: ThinkingLevel.moderate,
)
// Or set an exact token budget
GeminiConfig(
model: GeminiModel.pro25,
thinkingLevel: CustomThinkingBudget(tokens: 4000),
)
| Level | Tokens | Best For |
|---|---|---|
ThinkingLevel.none |
0 | Fastest, cheapest |
ThinkingLevel.light |
512 | Simple reasoning |
ThinkingLevel.moderate |
2,048 | Coding, math |
ThinkingLevel.deep |
8,192 | Complex problems |
ThinkingLevel.max |
24,576 | Hardest problems |
Access the model's reasoning in the response:
final response = await FlutterMind.send(
userMessage: 'explain quantum entanglement simply',
config: GeminiConfig(
model: GeminiModel.pro25,
thinkingLevel: ThinkingLevel.moderate,
),
);
print(response.text); // the answer
print(response.thinkingText); // how it got there (null if not a thinking model)
print(response.hasThinking); // true / false
Structured JSON output #
Force the model to always return valid, parseable JSON:
GeminiConfig(
model: GeminiModel.flash25,
responseMimeType: 'application/json',
responseSchema: {
'type': 'object',
'properties': {
'name': {'type': 'string'},
'genre': {'type': 'string'},
'rating': {'type': 'number'},
},
'required': ['name', 'genre', 'rating'],
},
)
beforeSend hook — inject runtime context #
Enrich every message with user profile, location, or app state before it reaches the AI:
FlutterMind.init(
engine: GeminiEngine(apiKey: 'YOUR_KEY'),
beforeSend: (message) async {
final user = await UserService.getProfile();
final location = await LocationService.current();
return 'User: ${user.name}, Location: $location\n\n$message';
},
);
// User types: "what restaurants are near me?"
// Model receives: "User: Osama, Location: Cairo, Egypt\n\nwhat restaurants are near me?"
Token management #
// Accurate count — calls the API, always free
final tokens = await FlutterMind.countTokens(userMessage: longText);
if (tokens > 100000) print('Message too long');
// Rough estimate — instant, no API call
// Note: Arabic text uses 2–3× more tokens than English
final estimate = FlutterMind.estimateTokens(message);
Retry configuration #
GeminiEngine(
apiKey: 'YOUR_KEY',
// Default — 2 attempts on 429, 500, 503
retry: RetryConfig(),
// Custom
retry: RetryConfig(
maxAttempts: 5,
delay: Duration(seconds: 2),
retryOn: {429, 503},
),
// Disable
retry: RetryConfig.none,
)
Availability check #
if (!await FlutterMind.isAvailable()) {
showDialog(context, 'AI is currently unavailable. Try again later.');
return;
}
Multiple engines in one app #
Use FlutterMindClient directly when you need more than one engine:
final chatClient = FlutterMindClient(
engine: GeminiEngine(
apiKey: 'YOUR_KEY',
config: GeminiConfig(
model: GeminiModel.flash25,
systemPrompt: Prompt(role: 'friendly chat assistant'),
),
),
);
final summaryClient = FlutterMindClient(
engine: GeminiEngine(
apiKey: 'YOUR_KEY',
config: GeminiConfig(
model: GeminiModel.pro25,
systemPrompt: Prompt(role: 'document summarizer', tone: ResponseTone.concise),
temperature: 0.1,
),
),
);
await chatClient.send(userMessage: 'hello');
await summaryClient.send(userMessage: longDocument);
Local Model (Offline) #
Run AI entirely on the user's device — no API key, no internet, no cost per request. Uses llama.cpp under the hood via Dart FFI.
Platform support #
| Platform | Support | Notes |
|---|---|---|
| Android | ✅ | One-time build.gradle setup |
| iOS | ✅ | Manual Xcode setup required |
| Linux | ✅ | Manual cmake build required |
| macOS | ✅ | Manual cmake build required |
| Windows | ✅ | Manual cmake build required |
| Web | ❌ | Dart FFI not supported on web |
Step 1 — Get a model file #
Models are .gguf files downloaded at runtime to the device. They are not bundled in the app (too large for app stores).
Recommended starter models from HuggingFace:
| Model | Size | Speed | Quality |
|---|---|---|---|
Qwen2.5-1.5B-Instruct-Q4_K_M.gguf |
~1 GB | ⚡ Very fast | Good |
Qwen2.5-3B-Instruct-Q4_K_M.gguf |
~2 GB | Fast | Better |
gemma-3-1b-it-Q4_K_M.gguf |
~0.8 GB | ⚡ Very fast | Good |
Phi-3-mini-4k-instruct-q4.gguf |
~2.2 GB | Fast | Better |
Download in your app on first launch (show a progress bar):
import 'package:path_provider/path_provider.dart';
import 'dart:io';
Future<String> downloadModel() async {
final dir = await getApplicationDocumentsDirectory();
final modelPath = '${dir.path}/models/qwen2.5-1.5b.gguf';
if (File(modelPath).existsSync()) return modelPath; // already downloaded
await Directory('${dir.path}/models').create(recursive: true);
final request = await HttpClient().getUrl(Uri.parse(
'https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF/resolve/main/qwen2.5-1.5b-instruct-q4_k_m.gguf',
));
final response = await request.close();
await response.pipe(File(modelPath).openWrite());
return modelPath;
}
Add
path_providerto yourpubspec.yamldependencies.
Step 2 — Platform setup #
Android
2a. Download these three files and place them in your app as shown:
| Download | Save as |
|---|---|
| CMakeLists.txt | android/app/CMakeLists.txt |
| local_model.h | android/native/include/local_model.h |
| local_model.cpp | android/native/src/local_model.cpp |
2b. Add externalNativeBuild to your android/app/build.gradle:
android {
defaultConfig {
externalNativeBuild {
cmake {
abiFilters 'arm64-v8a', 'x86_64'
arguments '-DANDROID_STL=c++_shared'
}
}
}
externalNativeBuild {
cmake {
path 'CMakeLists.txt' // the file you copied in step 2a
version '3.18.1'
}
}
}
2c. Run flutter build apk — Gradle downloads llama.cpp and compiles the library automatically. This takes 5–10 minutes on the first build, then it is cached.
Desktop (Linux / macOS / Windows)
2a. Navigate to the package source and build the library:
cd ~/.pub-cache/hosted/pub.dev/flutter_mind-0.1.0/lib/src/core/engines/local/native
cmake -B build
cmake --build build --config Release
2b. Copy the built library next to your app executable:
# Linux
cp build/liblocal_model.so /path/to/your/app/build/linux/x64/release/bundle/
# macOS
cp build/liblocal_model.dylib /path/to/your/app/build/macos/Build/Products/Release/
# Windows
cp build/Release/local_model.dll /path/to/your/app/build/windows/x64/runner/Release/
Run flutter build linux (or macos / windows) as normal after this.
iOS
2a. Build the static library from the package source on a Mac:
cd ~/.pub-cache/hosted/pub.dev/flutter_mind-0.1.0/lib/src/core/engines/local/ios/Classes
cmake -B build -DCMAKE_SYSTEM_NAME=iOS -DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_OSX_DEPLOYMENT_TARGET=14.0
cmake --build build --config Release
2b. In Xcode:
- Drag the built
liblocal_model.ainto your project - Add it to Link Binary with Libraries in your target's Build Phases
- Add the
include/folder to Header Search Paths
Step 3 — Use it in Dart #
Minimal:
final modelPath = await downloadModel();
final engine = LocalEngine(
config: LocalConfig(modelPath: modelPath),
);
final response = await engine.send(userMessage: 'Hello!');
print(response.text);
engine.dispose(); // free model memory when done
With full config:
final engine = LocalEngine(
config: LocalConfig(
modelPath: modelPath,
systemPrompt: Prompt(role: 'helpful assistant'),
modelType: LocalModelType.qwen, // skip auto-detection
temperature: 0.8,
maxOutputTokens: 512,
contextSize: 4096,
repeatPenalty: 1.1,
topP: 0.9,
topK: 40,
seed: 42, // fixed seed for reproducible output
threads: 4, // CPU threads — 0 = auto-detect
),
);
Streaming (yields the full response at once — true token streaming coming in v2):
engine.stream(userMessage: 'Tell me a story').listen((chunk) {
setState(() => text += chunk);
});
With conversation history:
final history = <ChatMessage>[];
final r1 = await engine.send(userMessage: 'My name is Osama');
history.add(ChatMessage.user('My name is Osama'));
history.add(ChatMessage.model(r1.text));
final r2 = await engine.send(
userMessage: 'What is my name?',
history: history,
);
print(r2.text); // "Your name is Osama"
LocalConfig reference #
| Parameter | Type | Default | Description |
|---|---|---|---|
modelPath |
String |
required | Absolute path to the .gguf file |
systemPrompt |
Prompt? |
null | Model persona and instructions |
modelType |
LocalModelType |
auto |
Chat template format (auto-detected from file metadata) |
temperature |
double? |
0.7 | Creativity — 0.0 deterministic, 2.0 very random |
maxOutputTokens |
int? |
512 | Max tokens to generate per response |
contextSize |
int? |
2048 | How many tokens of history the model remembers |
repeatPenalty |
double? |
1.1 | Penalizes repeated words — range 1.0–2.0 |
topP |
double? |
0.9 | Nucleus sampling threshold |
topK |
int? |
40 | Limits token pool size |
seed |
int? |
random | Fixed seed for reproducible output |
threads |
int? |
auto | CPU threads — 0 auto-detects from device |
LocalModelType values #
| Value | Models |
|---|---|
LocalModelType.auto |
Detects from .gguf metadata — recommended |
LocalModelType.qwen |
Qwen 2, 2.5 |
LocalModelType.llama3 |
Llama 3, 3.1, 3.2 |
LocalModelType.gemma |
Gemma 1, 2, 3 |
LocalModelType.phi |
Phi 2, 3, 4 |
LocalModelType.mistral |
Mistral family |
LocalModelType.deepSeek |
DeepSeek family |
Capabilities #
| Feature | Status |
|---|---|
| Text chat | ✅ |
| System prompt | ✅ |
| Conversation history | ✅ |
| Streaming | ✅ (full response at once — true token streaming coming in v2) |
| Vision / image input | ❌ coming in v2 |
| Audio | ❌ coming in v2 |
Gemini Models #
| Constant | Model ID | Status | Best For |
|---|---|---|---|
GeminiModel.flash25 |
gemini-2.5-flash | ✅ Stable | General use — recommended default |
GeminiModel.flash25Lite |
gemini-2.5-flash-lite | ✅ Stable | High volume, lowest cost |
GeminiModel.pro25 |
gemini-2.5-pro | ✅ Stable | Complex reasoning, analysis |
GeminiModel.flash3Preview |
gemini-3-flash-preview | ⚠️ Preview | Frontier performance |
GeminiModel.flash31Lite |
gemini-3.1-flash-lite | ✅ Stable | Fast, affordable, Gemini 3 |
GeminiModel.pro31Preview |
gemini-3.1-pro-preview | ⚠️ Preview | Most powerful available |
Use CustomModel for any model not listed:
GeminiConfig(model: CustomModel('gemini-4.0-ultra'))
Error Handling #
try {
final response = await FlutterMind.send(userMessage: message);
print(response.text);
} on ValidationException catch (e) {
// Bad input — empty message or exceeds 50,000 characters
print(e.message);
} on EngineException catch (e) {
// API error — invalid key, rate limit, network issue
print(e.message);
print(e.statusCode); // 401, 429, 500 ...
} on FlutterMindException catch (e) {
// Any other flutter_mind error
print(e.message);
}
Common status codes #
| Code | Meaning | Fix |
|---|---|---|
| 400 | Bad request or invalid API key | Check your key at aistudio.google.com/apikey |
| 401 | Unauthorized | API key rejected |
| 403 | No permission | Key may not have access to this model |
| 404 | Model not found | Check model name or use CustomModel |
| 429 | Rate limit | Add RetryConfig or upgrade your API plan |
| 500 | Server error | Temporary — try again |
API Key Security #
Never hardcode API keys in production apps. Anyone can extract them from your APK or IPA.
// During development — environment variable
GeminiEngine(
apiKey: const String.fromEnvironment('GEMINI_KEY'),
)
// In production — proxy through your own backend
// Flutter app → Your server → Gemini API
// The key never leaves your server
Use flutter_dotenv for local .env files.
Roadmap #
v1 — Current #
- ✅ Google Gemini engine
- ✅ Local model engine (llama.cpp — offline, no API key)
- ✅ Send and streaming
- ✅ Multi-turn conversation history
- ✅ Thinking model support (ThinkingLevel presets + custom budget)
- ✅ Structured JSON output
- ✅ Token management (accurate + estimate)
- ✅ Retry configuration
- ✅ Input validation
- ✅ beforeSend hook
- ✅ Prompt engineering system (Prompt, AiPreset, few-shot examples, chain of thought)
v2 — Coming Soon #
- ❌ OpenAI engine
- ❌ Anthropic Claude engine
- ❌ Response parser (JSON → typed Dart objects)
- ❌ True token streaming for local models
- ❌ flutter_mind_vision (image generation)
- ❌ flutter_mind_audio (TTS, STT)
Contributing #
Contributions are welcome. To contribute:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Commit your changes with a clear message
- Push and open a Pull Request
License #
MIT — see LICENSE for details.
Built by Mohamed Osama · Egypt 🇪🇬