llama_flutter_android 0.1.1 copy "llama_flutter_android: ^0.1.1" to clipboard
llama_flutter_android: ^0.1.1 copied to clipboard

PlatformAndroid

Run GGUF models on Android with llama.cpp - MIT licensed, Android-only Flutter plugin

llama_flutter_android #

License: MIT Platform: Android

Run GGUF models on Android with llama.cpp - A simple, MIT-licensed Flutter plugin.

Features #

  • Android Only - Optimized specifically for Android
  • Simple API - Easy-to-use Dart interface with Pigeon type safety
  • Token Streaming - Real-time token generation with EventChannel
  • Stop Generation - Cancel text generation mid-process on Android devices
  • 18 Parameters - Complete control: temperature, penalties, mirostat, seed, and more
  • 7 Chat Templates - ChatML, Llama-2, Alpaca, Vicuna, Phi, Gemma, Zephyr
  • Auto-Detection - Chat templates detected from model filename
  • Latest llama.cpp - Built on October 2025 llama.cpp (no patches needed)
  • ARM64 Optimized - NEON and dot product optimizations enabled

Local Chat App #

The plugin includes a complete example chat application that demonstrates how to integrate the plugin into a real application. The example app showcases:

  • Load Model from Local Storage - Select and load GGUF models directly from your device storage
  • Context Size Real Time Update - Monitor and adjust context usage in real-time with visual indicators
  • Advanced Parameters - Full control over model parameters including Temperature Control, Top-P, Top-K, and more
  • Auto Unload Model - Automatic model unloading when inactivity is detected to preserve device resources

An APK of the example app is available in the example-app Github for immediate testing.

Screenshots #

Home Screen App Settings App Settings 1

Requirements #

  • Flutter 3.24.0+
  • Dart SDK 3.3.0+
  • Android API 26+ (Android 8.0)
  • NDK r27+ (for 16KB page size support)

Installation #

Add to your pubspec.yaml:

dependencies:
  llama_flutter_android: latest

Quick Start #

Basic Usage #

import 'package:llama_flutter_android/llama_flutter_android.dart';

// Initialize controller
final controller = LlamaController();

// Load model
await controller.loadModel(
  modelPath: '/path/to/model.gguf',
  nThreads: 4,
  contextSize: 2048,
);

// Generate text with streaming
StreamSubscription? subscription;
subscription = controller.generate(
  prompt: 'Write a story about a robot',
  maxTokens: 512,
  temperature: 0.7,
).listen(
  (token) => print(token),  // Print each token as it arrives
  onDone: () => print('Generation complete!'),
  onError: (error) => print('Error: $error'),
);

// Stop generation mid-process (critical for UX!)
await controller.stop();
subscription?.cancel();

// Clean up
await controller.dispose();

Chat Mode with Templates #

// Chat with automatic template formatting
controller.generateChat(
  messages: [
    ChatMessage(role: 'system', content: 'You are a helpful assistant'),
    ChatMessage(role: 'user', content: 'Explain quantum computing'),
  ],
  template: 'chatml', // Auto-detected if null
  temperature: 0.7,
  maxTokens: 1000,
).listen((token) => print(token));

Advanced Parameters #

// Fine-grained control over generation
controller.generate(
  prompt: 'Explain machine learning',
  maxTokens: 1000,
  // Sampling
  temperature: 0.8,      // Creativity (0.0-2.0)
  topP: 0.9,             // Nucleus sampling
  topK: 40,              // Top-K sampling
  minP: 0.05,            // Minimum probability
  // Penalties (reduce repetition)
  repeatPenalty: 1.2,    // Penalize repeated tokens
  frequencyPenalty: 0.5, // Penalize frequent tokens
  presencePenalty: 0.3,  // Penalize token presence
  repeatLastN: 64,       // Penalty window size
  // Reproducibility
  seed: 42,              // Fixed seed for same output
  // Mirostat (perplexity control)
  mirostat: 2,           // 0=off, 1=v1, 2=v2
  mirostatTau: 5.0,      // Target perplexity
  mirostatEta: 0.1,      // Learning rate
).listen((token) => print(token));

// Stop anytime!
await controller.stop();

Architecture #

         Flutter App (Dart)
                ↓
    llama_flutter_android.dart
    (User-facing API)
                ↓
    Pigeon Generated Code
    (Type-safe bridge)
                ↓
    LlamaFlutterAndroidPlugin.kt
    (Kotlin coroutines)
                ↓
    InferenceService.kt
    (Foreground service)
                ↓
    jni_wrapper.cpp
    (JNI bridge)
                ↓
    llama.cpp
    (Native inference)

API Reference #

LlamaController #

The main interface for working with llama.cpp models.

Methods:

  • loadModel() - Load a GGUF model file
  • generate() - Generate text with streaming tokens
  • generateChat() - Generate chat responses with template formatting
  • stop() - Stop generation mid-process
  • dispose() - Clean up resources

Parameters:

  • Basic: maxTokens, seed
  • Sampling: temperature, topP, topK, minP, typicalP
  • Penalties: repeatPenalty, frequencyPenalty, presencePenalty, repeatLastN, penalizeNl
  • Mirostat: mirostat, mirostatTau, mirostatEta
  • Advanced: tfsZ, locallyTypical

Supported Chat Templates:

  • chatml - ChatML format (default)
  • llama2 - Llama-2 format
  • alpaca - Alpaca format
  • vicuna - Vicuna format
  • phi - Phi format
  • gemma - Gemma format
  • zephyr - Zephyr format

Release Build Notes #

If the release app crashes, try these solutions in android/app/build.gradle:

  • Enable MaxHeap: Add android:largeHeap="true" to the application manifest
  • Disable minification and shrinking: Set minifyEnabled false and shrinkResources false in release build settings

Contributing #

Contributions are welcome! Please read CONTRIBUTING.md for details.

License #

MIT License - see LICENSE file for details.

Credits #

  • llama.cpp - The amazing inference engine
  • Pigeon - Type-safe platform communication

Support #

5
likes
150
points
246
downloads

Publisher

unverified uploader

Weekly Downloads

Run GGUF models on Android with llama.cpp - MIT licensed, Android-only Flutter plugin

Repository (GitHub)
View/report issues
Contributing

Documentation

API reference

License

MIT (license)

Dependencies

flutter, plugin_platform_interface

More

Packages that depend on llama_flutter_android

Packages that implement llama_flutter_android