flutter_mind logo

flutter_mind

Any AI. One interface.

Why flutter_mind? #

Most AI packages for Flutter just wrap the API — you still have to write the prompts, handle errors, manage tokens, and figure out streaming yourself.

flutter_mind does more:

🔌 One API for all providers — same interface for cloud and local models
💬 Multi-turn chat — conversation history with automatic token trimming
⚡ Streaming — typing-effect UI out of the box
🧠 Thinking models — built-in support for reasoning budgets
🛡️ Safe by default — input validation, retry logic, and clear error messages
🎯 Zero Firebase required — API key for cloud, or fully offline with no key at all

Supported Providers #

Provider	Status	Models
Google Gemini	✅ v1	Flash 2.5, Pro 2.5, Flash-Lite, and more
Local Model (offline)	✅ v1	Any `.gguf` — Qwen, Llama, Gemma, Phi, Mistral, and more
OpenAI	🔜 v2	GPT-4o, GPT-4o Mini
Anthropic Claude	🔜 v2	Sonnet, Opus, Haiku
Grok	🔜 v2	—
DeepSeek	🔜 v2	—

Installation #

dependencies:
  flutter_mind: ^0.1.0

flutter pub get

Quick Start #

import 'package:flutter_mind/flutter_mind.dart';

void main() {
  FlutterMind.init(
    engine: GeminiEngine(apiKey: 'YOUR_GEMINI_API_KEY'),
  );
  runApp(MyApp());
}

// Anywhere in your app — no imports, no passing around
final response = await FlutterMind.send(userMessage: 'suggest a game');
print(response.text);

Three lines in main(). Done.

Getting Your API Key #

Google Gemini — Free tier available #

Go to aistudio.google.com/apikey
Sign in with your Google account
Click Create API Key — no credit card required

OpenAI (coming in v2) #

Go to platform.openai.com → API Keys → Create new secret key

Anthropic Claude (coming in v2) #

Go to console.anthropic.com → API Keys → Create Key

Local Model — No API key needed ✅ #

No account, no key, no internet required. Just a .gguf model file on the device. See the Local Model (Offline) section for full setup.

Usage #

Send a message #

final response = await FlutterMind.send(userMessage: 'what is Flutter?');

print(response.text);          // the response text
print(response.totalTokens);   // total tokens used
print(response.inputTokens);   // tokens in your message
print(response.outputTokens);  // tokens in the response

Streaming — typing effect UI #

FlutterMind.stream(userMessage: 'tell me a story').listen((chunk) {
  setState(() => text += chunk); // text appears word by word
});

Multi-turn chat — conversation with memory #

final history = <ChatMessage>[];

// First turn
final r1 = await FlutterMind.send(
  userMessage: 'my name is Osama',
  history: history,
);
history.add(ChatMessage.user('my name is Osama'));
history.add(ChatMessage.model(r1.text));

// Second turn — model remembers the name
final r2 = await FlutterMind.send(
  userMessage: 'what is my name?',
  history: history,
  maxHistoryMessages: 20, // oldest turns are dropped automatically
);
print(r2.text); // "Your name is Osama"

Engine configuration #

Set your defaults once — every call uses them automatically:

FlutterMind.init(
  engine: GeminiEngine(
    apiKey: 'YOUR_KEY',
    config: GeminiConfig(
      model: GeminiModel.flash25,
      systemPrompt: Prompt(role: 'game suggestion assistant'),
      temperature: 0.8,
      maxOutputTokens: 500,
    ),
  ),
);

Prompt engineering #

Control how the model behaves with the Prompt class — from one field to full expert config.

Tier 1 — Minimal

GeminiConfig(
  systemPrompt: Prompt(role: 'game suggestion assistant'),
)

Tier 2 — Standard

Prompt(
  role: 'game assistant',
  format: ResponseFormat.numberedList,
  maxItems: 3,
  language: ResponseLanguage.auto, // detects Arabic vs English per message
  constraints: ['mobile only', 'no violent games'],
)

Tier 3 — Advanced

Prompt(
  role: 'mobile game expert for Egyptian users',
  goal: 'suggest games that match the user mood and age',
  constraints: ['mobile only', 'no violent games', 'available in Egypt'],
  format: ResponseFormat.numberedList,
  maxItems: 3,
  language: ResponseLanguage.auto,
  tone: ResponseTone.friendly,
  audience: 'Egyptian teenagers',
  examples: [
    PromptExample(input: 'fun game', output: 'Hollow Knight — platformer'),
    PromptExample(input: 'relaxing', output: 'Stardew Valley — farming sim'),
  ],
)

Tier 4 — Expert

Prompt(
  role: 'game assistant',
  chainOfThought: true,
  chainSteps: ['identify user mood', 'match game genre', 'select 3 games'],
  preventInjection: true,        // resists jailbreak attempts
  responseAnchor: 'Here are your top 3 games:',
  negativePatterns: ['never suggest PC games'],
  compressed: false,             // verbose output for complex reasoning
)

Ready-made presets

// Use directly
GeminiConfig(systemPrompt: AiPreset.chat)
GeminiConfig(systemPrompt: AiPreset.summarizer)
GeminiConfig(systemPrompt: AiPreset.codeHelper)
GeminiConfig(systemPrompt: AiPreset.stepByStep)

// Customize one field
GeminiConfig(
  systemPrompt: AiPreset.chat.copyWith(role: 'Egyptian culture guide'),
)

Stop sequences — pair with the prompt

final prompt = Prompt(
  format: ResponseFormat.numberedList,
  maxItems: 3,
);

GeminiConfig(
  systemPrompt: prompt,
  stopSequences: prompt.stopSequences, // → ['[END]'] — model stops exactly here
)

Per-call config override #

Override only what changes for a single call — defaults stay untouched:

// Uses your default config
await FlutterMind.send(userMessage: 'suggest a game');

// Overrides just for this one call
await FlutterMind.send(
  userMessage: 'solve this complex math problem',
  config: GeminiConfig(
    model: GeminiModel.pro25,
    temperature: 0.1,
    thinkingLevel: ThinkingLevel.deep,
  ),
);

Thinking models #

Let the model reason before answering — better results on hard problems:

GeminiConfig(
  model: GeminiModel.pro25,
  thinkingLevel: ThinkingLevel.moderate,
)

// Or set an exact token budget
GeminiConfig(
  model: GeminiModel.pro25,
  thinkingLevel: CustomThinkingBudget(tokens: 4000),
)

Level	Tokens	Best For
`ThinkingLevel.none`	0	Fastest, cheapest
`ThinkingLevel.light`	512	Simple reasoning
`ThinkingLevel.moderate`	2,048	Coding, math
`ThinkingLevel.deep`	8,192	Complex problems
`ThinkingLevel.max`	24,576	Hardest problems

Access the model's reasoning in the response:

final response = await FlutterMind.send(
  userMessage: 'explain quantum entanglement simply',
  config: GeminiConfig(
    model: GeminiModel.pro25,
    thinkingLevel: ThinkingLevel.moderate,
  ),
);

print(response.text);         // the answer
print(response.thinkingText); // how it got there (null if not a thinking model)
print(response.hasThinking);  // true / false

Structured JSON output #

Force the model to always return valid, parseable JSON:

GeminiConfig(
  model: GeminiModel.flash25,
  responseMimeType: 'application/json',
  responseSchema: {
    'type': 'object',
    'properties': {
      'name':   {'type': 'string'},
      'genre':  {'type': 'string'},
      'rating': {'type': 'number'},
    },
    'required': ['name', 'genre', 'rating'],
  },
)

beforeSend hook — inject runtime context #

Enrich every message with user profile, location, or app state before it reaches the AI:

FlutterMind.init(
  engine: GeminiEngine(apiKey: 'YOUR_KEY'),
  beforeSend: (message) async {
    final user = await UserService.getProfile();
    final location = await LocationService.current();
    return 'User: ${user.name}, Location: $location\n\n$message';
  },
);

// User types: "what restaurants are near me?"
// Model receives: "User: Osama, Location: Cairo, Egypt\n\nwhat restaurants are near me?"

Token management #

// Accurate count — calls the API, always free
final tokens = await FlutterMind.countTokens(userMessage: longText);
if (tokens > 100000) print('Message too long');

// Rough estimate — instant, no API call
// Note: Arabic text uses 2–3× more tokens than English
final estimate = FlutterMind.estimateTokens(message);

Retry configuration #

GeminiEngine(
  apiKey: 'YOUR_KEY',

  // Default — 2 attempts on 429, 500, 503
  retry: RetryConfig(),

  // Custom
  retry: RetryConfig(
    maxAttempts: 5,
    delay: Duration(seconds: 2),
    retryOn: {429, 503},
  ),

  // Disable
  retry: RetryConfig.none,
)

Availability check #

if (!await FlutterMind.isAvailable()) {
  showDialog(context, 'AI is currently unavailable. Try again later.');
  return;
}

Multiple engines in one app #

Use FlutterMindClient directly when you need more than one engine:

final chatClient = FlutterMindClient(
  engine: GeminiEngine(
    apiKey: 'YOUR_KEY',
    config: GeminiConfig(
      model: GeminiModel.flash25,
      systemPrompt: Prompt(role: 'friendly chat assistant'),
    ),
  ),
);

final summaryClient = FlutterMindClient(
  engine: GeminiEngine(
    apiKey: 'YOUR_KEY',
    config: GeminiConfig(
      model: GeminiModel.pro25,
      systemPrompt: Prompt(role: 'document summarizer', tone: ResponseTone.concise),
      temperature: 0.1,
    ),
  ),
);

await chatClient.send(userMessage: 'hello');
await summaryClient.send(userMessage: longDocument);

Local Model (Offline) #

Run AI entirely on the user's device — no API key, no internet, no cost per request. Uses llama.cpp under the hood via Dart FFI.

Platform support #

Platform	Support	Notes
Android	✅	One-time `build.gradle` setup
iOS	✅	Manual Xcode setup required
Linux	✅	Manual cmake build required
macOS	✅	Manual cmake build required
Windows	✅	Manual cmake build required
Web	❌	Dart FFI not supported on web

Step 1 — Get a model file #

Models are .gguf files downloaded at runtime to the device. They are not bundled in the app (too large for app stores).

Recommended starter models from HuggingFace:

Model	Size	Speed	Quality
`Qwen2.5-1.5B-Instruct-Q4_K_M.gguf`	~1 GB	⚡ Very fast	Good
`Qwen2.5-3B-Instruct-Q4_K_M.gguf`	~2 GB	Fast	Better
`gemma-3-1b-it-Q4_K_M.gguf`	~0.8 GB	⚡ Very fast	Good
`Phi-3-mini-4k-instruct-q4.gguf`	~2.2 GB	Fast	Better

Download in your app on first launch (show a progress bar):

import 'package:path_provider/path_provider.dart';
import 'dart:io';

Future<String> downloadModel() async {
  final dir = await getApplicationDocumentsDirectory();
  final modelPath = '${dir.path}/models/qwen2.5-1.5b.gguf';

  if (File(modelPath).existsSync()) return modelPath; // already downloaded

  await Directory('${dir.path}/models').create(recursive: true);

  final request = await HttpClient().getUrl(Uri.parse(
    'https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF/resolve/main/qwen2.5-1.5b-instruct-q4_k_m.gguf',
  ));
  final response = await request.close();
  await response.pipe(File(modelPath).openWrite());

  return modelPath;
}

Add path_provider to your pubspec.yaml dependencies.

Step 2 — Platform setup #

Android

2a. Download these three files and place them in your app as shown:

Download	Save as
CMakeLists.txt	`android/app/CMakeLists.txt`
local_model.h	`android/native/include/local_model.h`
local_model.cpp	`android/native/src/local_model.cpp`

2b. Add externalNativeBuild to your android/app/build.gradle:

android {
    defaultConfig {
        externalNativeBuild {
            cmake {
                abiFilters 'arm64-v8a', 'x86_64'
                arguments '-DANDROID_STL=c++_shared'
            }
        }
    }
    externalNativeBuild {
        cmake {
            path 'CMakeLists.txt'   // the file you copied in step 2a
            version '3.18.1'
        }
    }
}

2c. Run flutter build apk — Gradle downloads llama.cpp and compiles the library automatically. This takes 5–10 minutes on the first build, then it is cached.

Desktop (Linux / macOS / Windows)

2a. Navigate to the package source and build the library:

cd ~/.pub-cache/hosted/pub.dev/flutter_mind-0.1.0/lib/src/core/engines/local/native
cmake -B build
cmake --build build --config Release

2b. Copy the built library next to your app executable:

# Linux
cp build/liblocal_model.so /path/to/your/app/build/linux/x64/release/bundle/

# macOS
cp build/liblocal_model.dylib /path/to/your/app/build/macos/Build/Products/Release/

# Windows
cp build/Release/local_model.dll /path/to/your/app/build/windows/x64/runner/Release/

Run flutter build linux (or macos / windows) as normal after this.

iOS

2a. Build the static library from the package source on a Mac:

cd ~/.pub-cache/hosted/pub.dev/flutter_mind-0.1.0/lib/src/core/engines/local/ios/Classes
cmake -B build -DCMAKE_SYSTEM_NAME=iOS -DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_OSX_DEPLOYMENT_TARGET=14.0
cmake --build build --config Release

2b. In Xcode:

Drag the built liblocal_model.a into your project
Add it to Link Binary with Libraries in your target's Build Phases
Add the include/ folder to Header Search Paths

Step 3 — Use it in Dart #

Minimal:

final modelPath = await downloadModel();

final engine = LocalEngine(
  config: LocalConfig(modelPath: modelPath),
);

final response = await engine.send(userMessage: 'Hello!');
print(response.text);

engine.dispose(); // free model memory when done

With full config:

final engine = LocalEngine(
  config: LocalConfig(
    modelPath: modelPath,
    systemPrompt: Prompt(role: 'helpful assistant'),
    modelType: LocalModelType.qwen,  // skip auto-detection
    temperature: 0.8,
    maxOutputTokens: 512,
    contextSize: 4096,
    repeatPenalty: 1.1,
    topP: 0.9,
    topK: 40,
    seed: 42,        // fixed seed for reproducible output
    threads: 4,      // CPU threads — 0 = auto-detect
  ),
);

Streaming (yields the full response at once — true token streaming coming in v2):

engine.stream(userMessage: 'Tell me a story').listen((chunk) {
  setState(() => text += chunk);
});

With conversation history:

final history = <ChatMessage>[];

final r1 = await engine.send(userMessage: 'My name is Osama');
history.add(ChatMessage.user('My name is Osama'));
history.add(ChatMessage.model(r1.text));

final r2 = await engine.send(
  userMessage: 'What is my name?',
  history: history,
);
print(r2.text); // "Your name is Osama"

LocalConfig reference #

Parameter	Type	Default	Description
`modelPath`	`String`	required	Absolute path to the `.gguf` file
`systemPrompt`	`Prompt?`	null	Model persona and instructions
`modelType`	`LocalModelType`	`auto`	Chat template format (auto-detected from file metadata)
`temperature`	`double?`	0.7	Creativity — 0.0 deterministic, 2.0 very random
`maxOutputTokens`	`int?`	512	Max tokens to generate per response
`contextSize`	`int?`	2048	How many tokens of history the model remembers
`repeatPenalty`	`double?`	1.1	Penalizes repeated words — range 1.0–2.0
`topP`	`double?`	0.9	Nucleus sampling threshold
`topK`	`int?`	40	Limits token pool size
`seed`	`int?`	random	Fixed seed for reproducible output
`threads`	`int?`	auto	CPU threads — 0 auto-detects from device

LocalModelType values #

Value	Models
`LocalModelType.auto`	Detects from `.gguf` metadata — recommended
`LocalModelType.qwen`	Qwen 2, 2.5
`LocalModelType.llama3`	Llama 3, 3.1, 3.2
`LocalModelType.gemma`	Gemma 1, 2, 3
`LocalModelType.phi`	Phi 2, 3, 4
`LocalModelType.mistral`	Mistral family
`LocalModelType.deepSeek`	DeepSeek family

Capabilities #

Feature	Status
Text chat	✅
System prompt	✅
Conversation history	✅
Streaming	✅ (full response at once — true token streaming coming in v2)
Vision / image input	❌ coming in v2
Audio	❌ coming in v2

Gemini Models #

Constant	Model ID	Status	Best For
`GeminiModel.flash25`	gemini-2.5-flash	✅ Stable	General use — recommended default
`GeminiModel.flash25Lite`	gemini-2.5-flash-lite	✅ Stable	High volume, lowest cost
`GeminiModel.pro25`	gemini-2.5-pro	✅ Stable	Complex reasoning, analysis
`GeminiModel.flash3Preview`	gemini-3-flash-preview	⚠️ Preview	Frontier performance
`GeminiModel.flash31Lite`	gemini-3.1-flash-lite	✅ Stable	Fast, affordable, Gemini 3
`GeminiModel.pro31Preview`	gemini-3.1-pro-preview	⚠️ Preview	Most powerful available

Use CustomModel for any model not listed:

GeminiConfig(model: CustomModel('gemini-4.0-ultra'))

Error Handling #

try {
  final response = await FlutterMind.send(userMessage: message);
  print(response.text);
} on ValidationException catch (e) {
  // Bad input — empty message or exceeds 50,000 characters
  print(e.message);
} on EngineException catch (e) {
  // API error — invalid key, rate limit, network issue
  print(e.message);
  print(e.statusCode); // 401, 429, 500 ...
} on FlutterMindException catch (e) {
  // Any other flutter_mind error
  print(e.message);
}

Common status codes #

Code	Meaning	Fix
400	Bad request or invalid API key	Check your key at aistudio.google.com/apikey
401	Unauthorized	API key rejected
403	No permission	Key may not have access to this model
404	Model not found	Check model name or use `CustomModel`
429	Rate limit	Add `RetryConfig` or upgrade your API plan
500	Server error	Temporary — try again

API Key Security #

Never hardcode API keys in production apps. Anyone can extract them from your APK or IPA.

// During development — environment variable
GeminiEngine(
  apiKey: const String.fromEnvironment('GEMINI_KEY'),
)

// In production — proxy through your own backend
// Flutter app → Your server → Gemini API
// The key never leaves your server

Use flutter_dotenv for local .env files.

Roadmap #

v1 — Current #

✅ Google Gemini engine
✅ Local model engine (llama.cpp — offline, no API key)
✅ Send and streaming
✅ Multi-turn conversation history
✅ Thinking model support (ThinkingLevel presets + custom budget)
✅ Structured JSON output
✅ Token management (accurate + estimate)
✅ Retry configuration
✅ Input validation
✅ beforeSend hook
✅ Prompt engineering system (Prompt, AiPreset, few-shot examples, chain of thought)

v2 — Coming Soon #

❌ OpenAI engine
❌ Anthropic Claude engine
❌ Response parser (JSON → typed Dart objects)
❌ True token streaming for local models
❌ flutter_mind_vision (image generation)
❌ flutter_mind_audio (TTS, STT)

Contributing #

Contributions are welcome. To contribute:

Fork the repository
Create a feature branch: git checkout -b feature/your-feature
Commit your changes with a clear message
Push and open a Pull Request

License #

MIT — see LICENSE for details.

Built by Mohamed Osama · Egypt 🇪🇬

flutter_mind 0.2.1 flutter_mind: ^0.2.1 copied to clipboard

Metadata