llama_flutter_android library

Flutter plugin for running GGUF language models on Android using llama.cpp.

Provides LlamaController for loading models, generating text, and detecting GPU capabilities via Vulkan. Supports streaming token output, chat templates, and configurable generation parameters.

Quick start

import 'package:llama_flutter_android/llama_flutter_android.dart';

final controller = LlamaController();
final gpu = await controller.detectGpu();
await controller.loadModel(
  modelPath: '/path/to/model.gguf',
  gpuLayers: gpu.recommendedGpuLayers,
);
controller.generate(prompt: 'Hello!').listen(print);

Classes

ChatMessage
A single message in a chat conversation.
ChatRequest
Request for chat generation with automatic template formatting.
ContextHelper
Helper class for managing model context and token limits.
ContextInfo
Current KV-cache context usage for a loaded model.
GenerateRequest
Request for text generation
GenerationConfig
Configuration for text generation.
GpuInfo
GPU detection result
LlamaController
User-friendly controller for llama.cpp
ModelConfig
Configuration for loading a GGUF model.
ModelLoadConfig
Configuration for model loading.