dart_llama 0.2.0 copy "dart_llama: ^0.2.0" to clipboard
dart_llama: ^0.2.0 copied to clipboard

A Dart package for interfacing with llama.cpp models using FFI

Changelog #

0.2.0 - 2025-01-21 #

Added #

  • Dart CLI Build Tool (dart_llama_tool)

    • global-install - One-command global installation: builds, installs library, activates globally
    • setup - Complete setup: builds llama.cpp, wrapper, regenerates FFI bindings, downloads model, runs tests
    • build - Build both llama.cpp and wrapper libraries
    • build-llama - Build only llama.cpp library (supports --static flag)
    • build-wrapper - Build only the wrapper library (supports --static flag)
    • compile - Compile CLI tools to native executables with bundled libraries
    • install-lib - Install wrapper library to ~/.dart_llama/ for global CLI usage
    • ffigen - Regenerate FFI bindings
    • download-model - Download Gemma 3 1B model for testing
    • clean - Remove built libraries, static libraries, dist folder, and llama.cpp source
  • Static Linking Support

    • build-llama --static builds llama.cpp as a static library
    • build-wrapper --static links wrapper with llama.cpp statically
    • Single libllama_wrapper.dylib contains all code for easier distribution
    • No dependency on separate libllama.dylib when using static linking
  • Native Executable Compilation

    • compile command creates dist/ folder with native executables
    • Bundles executables with required dynamic library
  • Globally Installable CLI Tools

    • ldcompletion - Text completion CLI with streaming support
    • ldchat - Interactive Gemma chat CLI with streaming support
  • Context Management

    • clearContext() method to reset KV cache between chat turns
    • Fixes context overflow error in long conversations
  • Typed Exception Hierarchy

    • LlamaException - sealed base class for all llama-related errors
    • ModelLoadException - thrown when model fails to load
    • ContextCreationException - thrown when context creation fails
    • TokenizationException - thrown when tokenization fails
    • PromptTooLongException - thrown when prompt exceeds limits
    • ContextOverflowException - thrown when context window fills up
    • DecodeException - thrown when decoding fails
    • Enables pattern matching for error handling instead of string matching
  • Comprehensive Test Suite

    • Unit tests for all data models and exception classes
    • Tests for LlamaModel lifecycle and error handling
    • Tests for clearContext() method
    • Tests for stop sequences in both generate and stream modes

Changed #

  • Project Structure Reorganization

    • Moved llama_wrapper.c and llama_wrapper.h to native/ directory
    • Updated build scripts and ffigen configuration for new paths
  • llama.cpp Management

    • Pinned llama.cpp to version b7783 for reproducible builds
    • llama.cpp is now fetched during build (not stored in repo)
    • Uses shallow clone for faster downloads
  • Build System

    • Replaced bash scripts with Dart CLI tool using args package
    • Removed scripts/ directory (bash scripts)

0.1.2 - 2025-08-05 #

Changed #

  • Renamed example/completion.dart to example/main.dart for better pub.dev scoring
  • Added comprehensive documentation for GenerationResponse class and all its properties

0.1.1 - 2025-08-04 #

Documentation #

  • Added comprehensive documentation comments to all public API elements
  • Added library-level documentation with examples and getting started guide
  • Documented all GenerationRequest parameters with usage guidance
  • Documented all LlamaConfig parameters with recommendations
  • Improved overall API documentation coverage from 26.5% to 100%

0.1.0 - 2025-08-04 #

Initial Release #

  • Core Features

    • FFI-based Dart bindings for llama.cpp
    • Low-level LlamaModel API for direct text generation control
    • Support for loading GGUF model files
    • Automatic memory management with proper cleanup
    • Real-time streaming support with token-by-token generation
    • Configurable stop sequences for controlling generation boundaries
  • API Features

    • LlamaModel - Main class for model initialization and text generation
    • GenerationRequest - Configurable generation parameters
    • GenerationResponse - Detailed generation results with token counts
    • Streaming and non-streaming generation modes
    • Temperature, top-p, top-k, and repeat penalty sampling controls
    • Random seed support for reproducible generation
    • Token counting and generation time tracking
  • Stop Sequence Support

    • Configurable stop sequences in GenerationRequest
    • Proper handling of stop sequences split across multiple tokens
    • Automatic trimming of stop sequences from output
    • Works correctly in both streaming and non-streaming modes
  • Memory Management

    • Fixed double-free error in sampler disposal
    • Proper lifecycle management for all native resources
    • RAII pattern with reliable dispose() methods
  • Examples

    • example/completion.dart - Simple text completion with streaming support
    • example/gemma_chat.dart - Full Gemma chat implementation with proper formatting
  • Developer Experience

    • Automated FFI binding generation with ffigen
    • Comprehensive build scripts for llama.cpp compilation
    • Model download script for testing (Gemma 3 1B)
    • Unit and integration tests
    • Code quality enforcement with very_good_analysis
  • Platform Support

    • macOS (ARM64 and x86_64) with Metal acceleration
    • Linux (x86_64)
    • Windows (x86_64) - untested
1
likes
150
points
125
downloads

Publisher

verified publisherwearemobilefirst.com

Weekly Downloads

A Dart package for interfacing with llama.cpp models using FFI

Repository (GitHub)
View/report issues

Documentation

API reference

License

MIT (license)

Dependencies

args, ffi, path

More

Packages that depend on dart_llama