flutter_gemma 0.8.4 copy "flutter_gemma: ^0.8.4" to clipboard
flutter_gemma: ^0.8.4 copied to clipboard

The plugin allows running the Gemma AI model locally on a device from a Flutter application.

Flutter Gemma #

The plugin supports not only Gemma, but also other models. Here’s the full list of supported models: Gemma 2B & Gemma 7B, Gemma-2 2B, Gemma-3 1B, Phi-2, Phi-3 , Phi-4, DeepSeek, Falcon-RW-1B, StableLM-3B.

Note: Currently, the flutter_gemma plugin supports Gemma-3, Phi-4 and DeepSeek only for Android and Web platforms. Support for iOS will be added in a future update. Gemma, Gemma 2 and others are supported for all platforms

Gemma is a family of lightweight, state-of-the art open models built from the same research and technology used to create the Gemini models

gemma_github_cover

Bring the power of Google's lightweight Gemma language models directly to your Flutter applications. With Flutter Gemma, you can seamlessly incorporate advanced AI capabilities into your iOS and Android apps, all without relying on external servers.

There is an example of using:

gemma_github_gif

Features #

  • Local Execution: Run Gemma models directly on user devices for enhanced privacy and offline functionality.
  • Platform Support: Compatible with both iOS and Android platforms.
  • LoRA Support: Efficient fine-tuning and integration of LoRA (Low-Rank Adaptation) weights for tailored AI behavior.
  • Ease of Use: Simple interface for integrating Gemma models into your Flutter projects.

Installation #

  1. Add flutter_gemma to your pubspec.yaml:

    dependencies:
      flutter_gemma: latest_version
    
    copied to clipboard
  2. Run flutter pub get to install.

Setup #

  1. Download Model and optionally LoRA Weights: Obtain a pre-trained Gemma model (recommended: 2b or 2b-it) from Kaggle
  1. Platfrom specific setup:

iOS

  • Enable file sharing in info.plist:
<key>UIFileSharingEnabled</key>
<true/>
copied to clipboard
  • Change the linking type of pods to static, replace use_frameworks! in Podfile with use_frameworks! :linkage => :static

Android

  • If you want to use a GPU to work with the model, you need to add OpenGL support in the manifest.xml. If you plan to use only the CPU, you can skip this step.

Add to 'AndroidManifest.xml' above tag </application>

 <uses-native-library
     android:name="libOpenCL.so"
     android:required="false"/>
 <uses-native-library android:name="libOpenCL-car.so" android:required="false"/>
 <uses-native-library android:name="libOpenCL-pixel.so" android:required="false"/>
copied to clipboard

Web

  • Web currently works only GPU backend models, CPU backend models are not suported by Mediapipe yet

  • Add dependencies to index.html file in web folder

  <script type="module">
  import { FilesetResolver, LlmInference } from 'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai';
  window.FilesetResolver = FilesetResolver;
  window.LlmInference = LlmInference;
  </script>
copied to clipboard

Usage #

The new API splits functionality into two parts:

  • ModelFileManager: Manages model and LoRA weights file handling.
  • InferenceModel: Handles model initialization and response generation.

The updated API splits the functionality into two main parts:

  • Access the plugin via:
final gemma = FlutterGemmaPlugin.instance;
copied to clipboard
  • Managing Model Files with ModelFileManager
final modelManager = gemma.modelManager;
copied to clipboard

Place the model in the assets or upload it to a network drive, such as Firebase.

ATTENTION!! You do not need to load the model every time the application starts; it is stored in the system files and only needs to be done once. Please carefully review the example application. You should use loadAssetModel and loadNetworkModel methods only when you need to upload the model to device

Usage #

1.Loading Models from assets (available only in debug mode):

Dont forget to add your model to pubspec.yaml

  1. Loading from assets (loraUrl is optional)
    await modelManager.installModelFromAsset('model.bin', loraPath: 'lora_weights.bin');
copied to clipboard
  1. Loading froms assets with Progress Status (loraUrl is optional)
    modelManager.installModelFromAssetWithProgress('model.bin', loraPath: 'lora_weights.bin').listen(
    (progress) {
      print('Loading progress: $progress%');
    },
    onDone: () {
      print('Model loading complete.');
    },
    onError: (error) {
      print('Error loading model: $error');
    },
  );
copied to clipboard

2.Loading Models from network:

  • For web usage, you will also need to enable CORS (Cross-Origin Resource Sharing) for your network resource. To enable CORS in Firebase, you can follow the guide in the Firebase documentation: Setting up CORS

    1. Loading from the network (loraUrl is optional).
   await modelManager.downloadModelFromNetwork('https://example.com/model.bin', loraUrl: 'https://example.com/lora_weights.bin');
copied to clipboard
  1. Loading froms the network with Progress Status (loraUrl is optional)
    modelManager.downloadModelFromNetworkWithProgress('https://example.com/model.bin', loraUrl: 'https://example.com/lora_weights.bin').listen(
    (progress) {
      print('Loading progress: $progress%');
    },
    onDone: () {
      print('Model loading complete.');
    },
    onError: (error) {
      print('Error loading model: $error');
    },
);
copied to clipboard
  1. Loading LoRA Weights
  1. Loading LoRA weight from the network.
await modelManager.downloadLoraWeightsFromNetwork('https://example.com/lora_weights.bin');
copied to clipboard
  1. Loading LoRA weight from assets.
await modelManager.installLoraWeightsFromAsset('lora_weights.bin');
copied to clipboard
  1. Model Management You can set model and weights paths manually
await modelManager.setModelPath('model.bin');
await modelManager.setLoraWeightsPath('lora_weights.bin');
copied to clipboard

You can delete the model and weights from the device. Deleting the model or LoRA weights will automatically close and clean up the inference. This ensures that there are no lingering resources or memory leaks when switching models or updating files.

await modelManager.deleteModel();
await modelManager.deleteLoraWeights();
copied to clipboard

5.Initialize:

Before performing any inference, you need to create a model instance. This ensures that your application is ready to handle requests efficiently.

final inferenceModel = await FlutterGemmaPlugin.instance.createModel(
modelType: ModelType.gemmaIt, // Required, model type to create
preferedBackend: BackendType.gpu, // Optional, backendType, default is BackendType.gpu
maxTokens: 512, // Optional, default is 1024
);
copied to clipboard

6.Using Sessions for Single Inferences:

If you need to generate individual responses without maintaining a conversation history, use sessions. Sessions allow precise control over inference and must be properly closed to avoid memory leaks.

  1. Synchronous Response Generation
final session = await inferenceModel.createSession(
  temperature: 1.0, // Optional, default is 0.8
  randomSeed: 1, // Optional, default is 1
  topK: 1, // Optional, default is 1
);

await session.addQueryChunk(Message(text: 'Tell me something interesting'));
String response = await session.getResponse();
print(response);

await session.close(); // Always close the session when done
copied to clipboard
  1. Asynchronous Response Generation
final session = await inferenceModel.createSession();
await session.addQueryChunk(Message(text: 'Tell me something interesting'));

session.getResponseAsync().listen((String token) {
print(token);
}, onDone: () {
print('Stream closed');
}, onError: (error) {
print('Error: $error');
});

await session.close();  // Always close the session when done
copied to clipboard

7.Chat Scenario with Automatic Session Management

For chat-based applications, you can create a chat instance. Unlike sessions, the chat instance manages the conversation context and refreshes sessions when necessary.

final chat = await inferenceModel.createChat(
  temperature: 0.8, // Controls response randomness
  randomSeed: 1, // Ensures reproducibility
  topK: 1, // Limits vocabulary scope
);
copied to clipboard
  1. Synchronous Chat
await chat.addQueryChunk(Message(text: 'User: Hello, who are you?'));
String response = await chat.generateChatResponse();
print(response);

await chat.addQueryChunk(Message(text: 'User: Are you sure?'));
String response2 = await chat.generateChatResponse();
print(response2);
copied to clipboard
  1. Asynchronous Chat (Streaming)
await chat.addQueryChunk(Message(text: 'User: Hello, who are you?'));

chat.generateChatResponseAsync().listen((String token) {
  print(token);
}, onDone: () {
  print('Chat stream closed');
}, onError: (error) {
  print('Chat error: $error');
});

await chat.addQueryChunk(Message(text: 'User: Are you sure?'));
chat.generateChatResponseAsync().listen((String token) {
  print(token);
}, onDone: () {
  print('Chat stream closed');
}, onError: (error) {
  print('Chat error: $error');
});
copied to clipboard

8.Checking Token Usage You can check the token size of a prompt before inference. The accumulated context should not exceed maxTokens to ensure smooth operation.

int tokenCount = await session.sizeInTokens('Your prompt text here');
print('Prompt size in tokens: $tokenCount');
copied to clipboard

9.Closing the Model

When you no longer need to perform any further inferences, call the close method to release resources:

await inferenceModel.close();
copied to clipboard

If you need to use the inference again later, remember to call createModel again before generating responses.

The full and complete example you can find in example folder

Important Considerations

  • Model Size: Larger models (such as 7b and 7b-it) might be too resource-intensive for on-device inference.
  • LoRA Weights: They provide efficient customization without the need for full model retraining.
  • Development vs. Production: For production apps, do not embed the model or LoRA weights within your assets. Instead, load them once and store them securely on the device or via a network drive.
  • Web Models: Currently, Web support is available only for GPU backend models.

Upcoming Features

In the next version, expect support for multimodality with Gemma 3, enabling text, image, and potentially other input types for even more advanced AI-powered applications.

104
likes
150
points
935
downloads

Publisher

verified publishermobilepeople.dev

Weekly Downloads

2024.09.16 - 2025.03.31

The plugin allows running the Gemma AI model locally on a device from a Flutter application.

Repository (GitHub)

Documentation

API reference

License

MIT (license)

Dependencies

flutter, flutter_web_plugins, large_file_handler, path, path_provider, plugin_platform_interface, shared_preferences

More

Packages that depend on flutter_gemma