dart_llama 0.2.0
dart_llama: ^0.2.0 copied to clipboard
A Dart package for interfacing with llama.cpp models using FFI
Changelog #
0.2.0 - 2025-01-21 #
Added #
-
Dart CLI Build Tool (
dart_llama_tool)global-install- One-command global installation: builds, installs library, activates globallysetup- Complete setup: builds llama.cpp, wrapper, regenerates FFI bindings, downloads model, runs testsbuild- Build both llama.cpp and wrapper librariesbuild-llama- Build only llama.cpp library (supports--staticflag)build-wrapper- Build only the wrapper library (supports--staticflag)compile- Compile CLI tools to native executables with bundled librariesinstall-lib- Install wrapper library to~/.dart_llama/for global CLI usageffigen- Regenerate FFI bindingsdownload-model- Download Gemma 3 1B model for testingclean- Remove built libraries, static libraries, dist folder, and llama.cpp source
-
Static Linking Support
build-llama --staticbuilds llama.cpp as a static librarybuild-wrapper --staticlinks wrapper with llama.cpp statically- Single
libllama_wrapper.dylibcontains all code for easier distribution - No dependency on separate
libllama.dylibwhen using static linking
-
Native Executable Compilation
compilecommand createsdist/folder with native executables- Bundles executables with required dynamic library
-
Globally Installable CLI Tools
ldcompletion- Text completion CLI with streaming supportldchat- Interactive Gemma chat CLI with streaming support
-
Context Management
clearContext()method to reset KV cache between chat turns- Fixes context overflow error in long conversations
-
Typed Exception Hierarchy
LlamaException- sealed base class for all llama-related errorsModelLoadException- thrown when model fails to loadContextCreationException- thrown when context creation failsTokenizationException- thrown when tokenization failsPromptTooLongException- thrown when prompt exceeds limitsContextOverflowException- thrown when context window fills upDecodeException- thrown when decoding fails- Enables pattern matching for error handling instead of string matching
-
Comprehensive Test Suite
- Unit tests for all data models and exception classes
- Tests for
LlamaModellifecycle and error handling - Tests for
clearContext()method - Tests for stop sequences in both generate and stream modes
Changed #
-
Project Structure Reorganization
- Moved
llama_wrapper.candllama_wrapper.htonative/directory - Updated build scripts and ffigen configuration for new paths
- Moved
-
llama.cpp Management
- Pinned llama.cpp to version b7783 for reproducible builds
- llama.cpp is now fetched during build (not stored in repo)
- Uses shallow clone for faster downloads
-
Build System
- Replaced bash scripts with Dart CLI tool using
argspackage - Removed
scripts/directory (bash scripts)
- Replaced bash scripts with Dart CLI tool using
0.1.2 - 2025-08-05 #
Changed #
- Renamed
example/completion.darttoexample/main.dartfor better pub.dev scoring - Added comprehensive documentation for
GenerationResponseclass and all its properties
0.1.1 - 2025-08-04 #
Documentation #
- Added comprehensive documentation comments to all public API elements
- Added library-level documentation with examples and getting started guide
- Documented all
GenerationRequestparameters with usage guidance - Documented all
LlamaConfigparameters with recommendations - Improved overall API documentation coverage from 26.5% to 100%
0.1.0 - 2025-08-04 #
Initial Release #
-
Core Features
- FFI-based Dart bindings for llama.cpp
- Low-level
LlamaModelAPI for direct text generation control - Support for loading GGUF model files
- Automatic memory management with proper cleanup
- Real-time streaming support with token-by-token generation
- Configurable stop sequences for controlling generation boundaries
-
API Features
LlamaModel- Main class for model initialization and text generationGenerationRequest- Configurable generation parametersGenerationResponse- Detailed generation results with token counts- Streaming and non-streaming generation modes
- Temperature, top-p, top-k, and repeat penalty sampling controls
- Random seed support for reproducible generation
- Token counting and generation time tracking
-
Stop Sequence Support
- Configurable stop sequences in
GenerationRequest - Proper handling of stop sequences split across multiple tokens
- Automatic trimming of stop sequences from output
- Works correctly in both streaming and non-streaming modes
- Configurable stop sequences in
-
Memory Management
- Fixed double-free error in sampler disposal
- Proper lifecycle management for all native resources
- RAII pattern with reliable
dispose()methods
-
Examples
example/completion.dart- Simple text completion with streaming supportexample/gemma_chat.dart- Full Gemma chat implementation with proper formatting
-
Developer Experience
- Automated FFI binding generation with ffigen
- Comprehensive build scripts for llama.cpp compilation
- Model download script for testing (Gemma 3 1B)
- Unit and integration tests
- Code quality enforcement with very_good_analysis
-
Platform Support
- macOS (ARM64 and x86_64) with Metal acceleration
- Linux (x86_64)
- Windows (x86_64) - untested