llamadart 0.2.0
llamadart: ^0.2.0 copied to clipboard
A Dart/Flutter plugin for llama.cpp - run LLM inference on any platform using GGUF models
llamadart #
llamadart is a high-performance Dart and Flutter plugin for llama.cpp. It allows you to run Large Language Models (LLMs) locally using GGUF models across all major platforms with minimal setup.
β¨ Features #
- π High Performance: Powered by
llama.cpp's optimized C++ kernels. - π οΈ Zero Configuration: Uses the modern Pure Native Asset mechanismβno manual build scripts or platform folders required.
- π± Cross-Platform: Full support for Android, iOS, macOS, Linux, and Windows.
- β‘ GPU Acceleration:
- Apple: Metal (macOS/iOS)
- Android/Linux/Windows: Vulkan
- π Web Support: Run inference in the browser via WASM (powered by
wllama). - π Dart-First API: Streamlined FFI bindings with a clean, isolate-safe Dart interface.
- π Logging Control: Granular control over native engine output (debug, info, warn, error, none).
π Compatibility & Test Status #
| Platform | Architecture(s) | GPU Backend | Status |
|---|---|---|---|
| macOS | arm64, x86_64 | Metal | β Tested (CPU, Metal) |
| iOS | arm64 (Device), x86_64 (Sim) | Metal (Device), CPU (Sim) | β Tested (CPU, Metal) |
| Android | arm64-v8a, x86_64 | Vulkan | β Tested (CPU, Vulkan) |
| Linux | arm64, x86_64 | Vulkan | β οΈ Tested (CPU Verified, Vulkan Untested) |
| Windows | x64 | Vulkan | β Tested (CPU, Vulkan) |
| Web | WASM | CPU | β Tested (WASM) |
π Quick Start #
1. Installation #
Add llamadart to your pubspec.yaml:
dependencies:
llamadart: ^0.2.0
2. Zero Setup (Native Assets) #
llamadart leverages the Dart Native Assets (build hooks) system. When you run your app for the first time (dart run or flutter run), the package automatically:
- Detects your target platform and architecture.
- Downloads the appropriate pre-compiled stable binary from GitHub.
- Bundles it seamlessly into your application.
No manual binary downloads or CMake configuration are needed.
3. Basic Usage #
import 'dart:io';
import 'package:llamadart/llamadart.dart';
void main() async {
// 1. Create the service
final service = LlamaService();
// 2. Initialize with a GGUF model
// This loads the model and prepares the native backend (GPU/CPU)
await service.init('path/to/your_model.gguf');
// 3. Generate text (streaming)
final stream = service.generate('The capital of France is');
await for (final token in stream) {
stdout.write(token);
await stdout.flush();
}
// 4. Clean up resources
service.dispose();
}
π Examples #
Explore the example/ directory for full implementations:
basic_app: A lightweight CLI example for quick verification.chat_app: A feature-rich Flutter chat application with streaming UI and model management.
π³ Docker (Linux) #
You can build and run the examples using Docker on Linux. This ensures all build dependencies (like libgtk-3-dev, cmake, etc.) are correctly configured.
1. Build and Run CLI Basic Example #
./docker/build-docker.sh basic-run
2. Build Flutter Chat App for Linux #
./docker/build-docker.sh chat-build
The Dockerfile is multi-stage and optimized to minimize context size. It handles the downloading of native assets and compilation of Flutter Linux binaries.
ποΈ Architecture #
This package follows the "Pure Native Asset" philosophy:
- Maintenance: All native build logic and submodules are isolated in
third_party/. - Distribution: Binaries are produced via GitHub Actions and hosted on GitHub Releases.
- Integration: The
hook/build.dartmanages the lifecycle of native dependencies, keeping your project root clean.
π€ Contributing #
Contributions are welcome! Please see CONTRIBUTING.md for architecture details and maintainer instructions for building native binaries.
π License #
This project is licensed under the MIT License - see the LICENSE file for details.