llama_flutter_android library
Flutter plugin for running GGUF language models on Android using llama.cpp.
Provides LlamaController for loading models, generating text, and detecting GPU capabilities via Vulkan. Supports streaming token output, chat templates, and configurable generation parameters.
Quick start
import 'package:llama_flutter_android/llama_flutter_android.dart';
final controller = LlamaController();
final gpu = await controller.detectGpu();
await controller.loadModel(
modelPath: '/path/to/model.gguf',
gpuLayers: gpu.recommendedGpuLayers,
);
controller.generate(prompt: 'Hello!').listen(print);
Classes
- ChatMessage
- A single message in a chat conversation.
- ChatRequest
- Request for chat generation with automatic template formatting.
- ContextHelper
- Helper class for managing model context and token limits.
- ContextInfo
- Current KV-cache context usage for a loaded model.
- GenerateRequest
- Request for text generation
- GenerationConfig
- Configuration for text generation.
- GpuInfo
- GPU detection result
- LlamaController
- User-friendly controller for llama.cpp
- ModelConfig
- Configuration for loading a GGUF model.
- ModelLoadConfig
- Configuration for model loading.