llama_flutter_android library

Flutter plugin for running GGUF language models on Android using llama.cpp.

Provides LlamaController for loading models, generating text, and detecting GPU capabilities via Vulkan. Supports streaming token output, chat templates, and configurable generation parameters.

Quick start

import 'package:llama_flutter_android/llama_flutter_android.dart';

final controller = LlamaController();
final gpu = await controller.detectGpu();
await controller.loadModel(
  modelPath: '/path/to/model.gguf',
  gpuLayers: gpu.recommendedGpuLayers,
);
controller.generate(prompt: 'Hello!').listen(print);

Classes

ChatMessage

A single message in a chat conversation.

ChatRequest

Request for chat generation with automatic template formatting.

ContextHelper

Helper class for managing model context and token limits.

ContextInfo

Current KV-cache context usage for a loaded model.

GenerateRequest

Request for text generation

GenerationConfig