llamadart 0.3.0 copy "llamadart: ^0.3.0" to clipboard
llamadart: ^0.3.0 copied to clipboard

A Dart/Flutter plugin for llama.cpp - run LLM inference on any platform using GGUF models

llamadart #

Pub Version License: MIT GitHub

llamadart is a high-performance Dart and Flutter plugin for llama.cpp. It allows you to run Large Language Models (LLMs) locally using GGUF models across all major platforms with minimal setup.

✨ Features #

  • πŸš€ High Performance: Powered by llama.cpp's optimized C++ kernels.
  • πŸ› οΈ Zero Configuration: Uses the modern Pure Native Asset mechanismβ€”no manual build scripts or platform folders required.
  • πŸ“± Cross-Platform: Full support for Android, iOS, macOS, Linux, and Windows.
  • ⚑ GPU Acceleration:
    • Apple: Metal (macOS/iOS)
    • Android/Linux/Windows: Vulkan
  • LoRA Support: Apply fine-tuned adapters (GGUF) dynamically at runtime.
  • 🌐 Web Support: Run inference in the browser via WASM (powered by wllama v2).
  • πŸ’Ž Dart-First API: Streamlined architecture with decoupled backends.
  • πŸ”‡ Logging Control: Toggle native engine output or use granular filtering on Web.
  • πŸ§ͺ High Coverage: Robust test suite with 80%+ global core coverage.

πŸ—οΈ Architecture #

llamadart 0.3.0+ uses a modern, decoupled architecture designed for flexibility and platform independence:

  • LlamaEngine: The primary high-level orchestrator. It handles model lifecycle, tokenization, chat templating, and manages the inference stream.
  • LlamaBackend: A platform-agnostic interface that allows swapping implementation details:
    • NativeLlamaBackend: Uses Dart FFI and background Isolates for high-performance desktop/mobile inference.
    • WebLlamaBackend: Uses WebAssembly and the wllama JS library for in-browser inference.
  • LlamaBackendFactory: Automatically selects the appropriate backend for your current platform.

πŸš€ Quick Start #

Platform Architecture(s) GPU Backend Status
macOS arm64, x86_64 Metal βœ… Tested
iOS arm64 (Device), x86_64 (Sim) Metal (Device), CPU (Sim) βœ… Tested
Android arm64-v8a, x86_64 Vulkan βœ… Tested
Linux arm64, x86_64 Vulkan βœ… Tested
Windows x64 Vulkan βœ… Tested
Web WASM CPU βœ… Tested

πŸ“¦ Installation #

Add llamadart to your pubspec.yaml:

dependencies:
  llamadart: ^0.3.0

Zero Setup (Native Assets) #

llamadart leverages the Dart Native Assets (build hooks) system. When you run your app for the first time (dart run or flutter run), the package automatically:

  1. Detects your target platform and architecture.
  2. Downloads the appropriate pre-compiled binary from GitHub.
  3. Bundles it seamlessly into your application.

No manual binary downloads, CMake configuration, or platform-specific project changes are needed.


πŸ› οΈ Usage #

1. Simple Usage #

The easiest way to get started is by using the default LlamaBackend.

import 'package:llamadart/llamadart.dart';

void main() async {
  // Automatically selects Native or Web backend
  final engine = LlamaEngine(LlamaBackend());

  try {
    // Initialize with a local GGUF model
    await engine.loadModel('path/to/model.gguf');

    // Generate text (streaming)
    await for (final token in engine.generate('The capital of France is')) {
      print(token);
    }
  } finally {
    // CRITICAL: Always dispose the engine to release native resources
    await engine.dispose();
  }
}

2. Advanced Usage (Decoupled Engine) #

Use LlamaEngine directly for more granular control, such as swapping backends or manual context management.

import 'package:llamadart/llamadart.dart';

void main() async {
  // Explicitly select Native or Web backend
  final backend = NativeLlamaBackend(); 
  final engine = LlamaEngine(backend);

  try {
    await engine.loadModel('model.gguf');

    // High-level Chat interface (handles templates and stop sequences)
    final messages = [
      LlamaChatMessage(role: 'system', content: 'You are a poetic assistant.'),
      LlamaChatMessage(role: 'user', content: 'Tell a story about a cat.'),
    ];

    await for (final text in engine.chat(messages)) {
      print(text);
    }
  } finally {
    await engine.dispose();
  }
}

🧹 Resource Management #

Since llamadart allocates significant native memory and manages background worker Isolates/Threads, it is essential to manage its lifecycle correctly.

  • Explicit Disposal: Always call await engine.dispose() when you are finished with an engine instance.
  • Native Stability: On mobile and desktop, failing to dispose can lead to "hanging" background processes or memory pressure.
  • Hot Restart Support: In Flutter, placing the engine inside a Provider or State and calling dispose() in the appropriate lifecycle method ensures stability across Hot Restarts.
@override
void dispose() {
  _engine.dispose();
  super.dispose();
}

🎨 Low-Rank Adaptation (LoRA) #

llamadart supports applying multiple LoRA adapters dynamically at runtime.

  • Dynamic Scaling: Adjust the strength (scale) of each adapter on the fly.
  • Isolate-Safe: Native adapters are managed in a background Isolate to prevent UI jank.
  • Efficient: Multiple LoRAs share the memory of a single base model.

Check out our LoRA Training Notebook to learn how to train and convert your own adapters.


πŸ§ͺ Testing & Quality #

This project maintains a high standard of quality with 80%+ global test coverage.

  • Native Tests: Integration tests using real GGUF models via FFI.
  • Web Tests: Browser-based unit and integration tests using Chrome.
  • CI/CD: Automatic analysis, linting, and cross-platform test execution on every PR.
# Run all native tests
dart test

# Run web tests (requires Chrome)
dart test -p chrome test/web_backend_unit_test.dart

🀝 Contributing #

Contributions are welcome! Please see CONTRIBUTING.md for architecture details and maintainer instructions for building native binaries.

πŸ“œ License #

This project is licensed under the MIT License - see the LICENSE file for details.

12
likes
0
points
707
downloads

Publisher

verified publisherleehack.com

Weekly Downloads

A Dart/Flutter plugin for llama.cpp - run LLM inference on any platform using GGUF models

Repository (GitHub)
View/report issues

Topics

#llama #llm #ai #inference #gguf

License

unknown (license)

Dependencies

code_assets, ffi, flutter, hooks, http, logging, path, web

More

Packages that depend on llamadart