picolm_flutter 0.0.2 copy "picolm_flutter: ^0.0.2" to clipboard
picolm_flutter: ^0.0.2 copied to clipboard

On-device LLM inference engine for Flutter. Run LLaMA-architecture models via Dart FFI — no cloud, no internet, no API keys.

picolm_flutter #

PicoLM Logo

A Flutter FFI plugin that wraps the picolm C inference engine, enabling on-device LLM inference for iOS, Android, and macOS.

Run LLaMA-architecture models (like TinyLlama) directly on device via Dart FFI — no cloud, no internet connection, and no API keys required!

Features #

  • Pure C11 engine: No C++ dependencies, built for maximum portability.
  • Cross-Platform: macOS, iOS, and Android support using Dart FFI.
  • GGUF Support: Load .gguf models directly from the device filesystem (mmap enabled).
  • Isolate-Powered Engine: Model loading and inference loop run on background Isolates to keep your Flutter UI 60fps smooth.
  • Real-time Streaming: Tokens stream back to the UI in real-time via SendPort.
  • JSON Mode: Grammar-constrained output generation ensures valid JSON for tool calling.

Getting Started #

Add the package to your pubspec.yaml:

dependencies:
  picolm_flutter: ^0.0.1

Download a Model #

You must provide a .gguf model file on the device. For example, TinyLlama 1.1B Q4_K_M.

Download the model to your app's document directory.

Usage #

import 'package:picolm_flutter/picolm_flutter.dart';

void main() async {
  // Load the model (runs automatically on a background isolate, won't freeze UI)
  final model = await PicoLM.load('/path/to/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf');
  
  // (Optional) Configure sampling
  model.setSampling(temperature: 0.7, topP: 0.9);

  // TinyLlama requires the ChatML template
  final prompt = "<|user|>\nWhat is the speed of light?</s>\n<|assistant|>\n";
  
  // Generate tokens streaming to the UI
  await for (final token in model.generate(prompt, maxTokens: 200)) {
    print(token); // Update UI here
  }
  
  // Always clean up native resources when done
  model.dispose();
}

Example App #

Check out the example/ folder containing a fully working Flutter app with:

  • Downloading the 638MB model via HTTP directly to the device
  • Loading the model
  • A ChatGPT-like minimal streaming UI
0
likes
150
points
109
downloads

Publisher

unverified uploader

Weekly Downloads

On-device LLM inference engine for Flutter. Run LLaMA-architecture models via Dart FFI — no cloud, no internet, no API keys.

Repository (GitHub)
View/report issues

Documentation

API reference

License

MIT (license)

Dependencies

flutter

More

Packages that depend on picolm_flutter

Packages that implement picolm_flutter