genkit_flutter_gemma

Genkit Dart plugin for flutter_gemma — local on-device AI inference via Google Gemma and other supported models.

genkit_flutter_gemma_cover

Features

  • Wraps flutter_gemma as a Genkit model provider
  • Supports text generation (blocking and streaming)
  • Embeddings via FlutterGemmaEmbedder
  • Multimodal input (images, audio) — supports data: URIs, file:// paths, and http(s):// URLs
  • Function calling / tool use with toolChoice control (auto, required, none)
  • Parallel tool calls — multiple function calls in a single model response
  • Thinking mode (Gemma 4, DeepSeek)
  • Generation latency tracking via latencyMs in responses
  • Configurable via @Schema()-annotated options

Supported Model Architectures

Architecture ModelType Notes
Gemma 3 / Gemma 4 IT ModelType.gemmaIt Default; multimodal (image, audio); thinking mode for Gemma 4
DeepSeek ModelType.deepSeek Thinking mode
Qwen / Qwen3 ModelType.qwen / ModelType.qwen3 Qwen3 supports thinking mode
Llama ModelType.llama
Phi ModelType.phi Phi-4
FunctionGemma ModelType.functionGemma Specialized function calling

Quick Start

import 'package:flutter_gemma/flutter_gemma.dart';
import 'package:genkit/genkit.dart';
import 'package:genkit_flutter_gemma/genkit_flutter_gemma.dart';

// Initialize and install model (host app responsibility)
await FlutterGemma.initialize();
await FlutterGemma.installModel(modelType: ModelType.gemmaIt)
    .fromAsset('assets/gemma-3-1b-it-int4.task')
    .install();

// Create Genkit with plugin
final ai = Genkit(plugins: [
  GenkitFlutterGemmaPlugin(
    models: [
      FlutterGemmaModelConfig(
        name: 'gemma-3-nano',
        modelType: ModelType.gemmaIt,
      ),
    ],
    embedders: [
      FlutterGemmaEmbedderConfig(name: 'embedding-gemma-300m'),
    ],
  ),
]);

// Generate
final response = await ai.generate(
  model: flutterGemma.model('gemma-3-nano'),
  prompt: 'Hello!',
);
print(response.text);

Configuration

Pass FlutterGemmaModelOptions to customize inference:

final response = await ai.generate(
  model: flutterGemma.model('gemma-3-nano'),
  prompt: 'Hello!',
  config: FlutterGemmaModelOptions(
    maxTokens: 2048,
    temperature: 0.5,
    topK: 40,
    supportImage: true,
  ),
);
Option Type Default Description
maxTokens int? 1024 Maximum tokens to generate
temperature double? 0.8 Sampling temperature
topK int? 1 Top-K sampling
topP double? null Top-P (nucleus) sampling
supportImage bool? false Enable multimodal image input
supportAudio bool? false Enable audio input (Gemma 3n)
isThinking bool? false Enable thinking mode (Gemma 4, DeepSeek)
randomSeed int? 1 Random seed for deterministic output
toolChoice String? 'auto' Tool calling mode: 'auto', 'required', 'none'
systemInstruction String? null System-level instruction (overrides system-role messages)
maxFunctionBufferLength int? null Max token buffer for streaming tool-call arguments (increase for large payloads)

Streaming

final stream = ai.generateStream(
  model: flutterGemma.model('gemma-3-nano'),
  prompt: 'Write a story.',
);

await for (final chunk in stream) {
  stdout.write(chunk.text);
}

Tool Use

final response = await ai.generate(
  model: flutterGemma.model('gemma-3-nano'),
  prompt: 'What is the weather in Paris?',
  tools: [weatherTool],
);

Embeddings

// Install embedding model + tokenizer (host app responsibility)
await FlutterGemma.installEmbedder()
    .modelFromNetwork('https://huggingface.co/.../embeddinggemma-300M.tflite')
    .tokenizerFromNetwork('https://huggingface.co/.../sentencepiece.model')
    .install();

// Generate embeddings
final embeddings = await ai.embed(
  embedder: flutterGemma.embedder('embedding-gemma-300m'),
  documents: [
    DocumentData(content: [TextPart(text: 'Flutter is a UI toolkit.')]),
    DocumentData(content: [TextPart(text: 'Dart is a programming language.')]),
  ],
);

for (final embedding in embeddings) {
  print('Vector (${embedding.embedding.length} dims): '
      '${embedding.embedding.take(5)}...');
}

Known Limitations

  • Model installation: The plugin does NOT manage model installation. The host app must install models via FlutterGemma.installModel() and embedders via FlutterGemma.installEmbedder() before using the plugin.
  • System role: System messages are passed natively via createChat(systemInstruction:) (requires flutter_gemma ^0.13.0). Only text content is supported in system messages.
  • Thinking mode: Requires .litertlm model format. Supported on Android, iOS, and Desktop. Not supported on Web.

Libraries

genkit_flutter_gemma
Genkit Dart plugin for flutter_gemma — local on-device AI inference.