picolm_flutter 0.0.2
picolm_flutter: ^0.0.2 copied to clipboard
On-device LLM inference engine for Flutter. Run LLaMA-architecture models via Dart FFI — no cloud, no internet, no API keys.
picolm_flutter #
A Flutter FFI plugin that wraps the picolm C inference engine, enabling on-device LLM inference for iOS, Android, and macOS.
Run LLaMA-architecture models (like TinyLlama) directly on device via Dart FFI — no cloud, no internet connection, and no API keys required!
Features #
- Pure C11 engine: No C++ dependencies, built for maximum portability.
- Cross-Platform: macOS, iOS, and Android support using Dart FFI.
- GGUF Support: Load
.ggufmodels directly from the device filesystem (mmap enabled). - Isolate-Powered Engine: Model loading and inference loop run on background Isolates to keep your Flutter UI 60fps smooth.
- Real-time Streaming: Tokens stream back to the UI in real-time via
SendPort. - JSON Mode: Grammar-constrained output generation ensures valid JSON for tool calling.
Getting Started #
Add the package to your pubspec.yaml:
dependencies:
picolm_flutter: ^0.0.1
Download a Model #
You must provide a .gguf model file on the device. For example, TinyLlama 1.1B Q4_K_M.
Download the model to your app's document directory.
Usage #
import 'package:picolm_flutter/picolm_flutter.dart';
void main() async {
// Load the model (runs automatically on a background isolate, won't freeze UI)
final model = await PicoLM.load('/path/to/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf');
// (Optional) Configure sampling
model.setSampling(temperature: 0.7, topP: 0.9);
// TinyLlama requires the ChatML template
final prompt = "<|user|>\nWhat is the speed of light?</s>\n<|assistant|>\n";
// Generate tokens streaming to the UI
await for (final token in model.generate(prompt, maxTokens: 200)) {
print(token); // Update UI here
}
// Always clean up native resources when done
model.dispose();
}
Example App #
Check out the example/ folder containing a fully working Flutter app with:
- Downloading the 638MB model via HTTP directly to the device
- Loading the model
- A ChatGPT-like minimal streaming UI