llm_llamacpp 0.1.9
llm_llamacpp: ^0.1.9 copied to clipboard
llama.cpp backend implementation for LLM interactions. Enables local on-device inference with GGUF models on Android, iOS, macOS, Windows, and Linux.
Changelog #
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased] #
0.1.9 - 2026-02-28 #
Changed #
maxToolAttemptsdefault increased from 25 to 90- Bumped
llm_coredependency to ^0.1.9
0.1.8 - 2026-02-26 #
Changed #
- Bumped
llm_coredependency to ^0.1.8 for tool calling stream visibility
0.1.7 - 2026-02-10 #
Added #
batchEmbed()implementation: delegates to existingembed()(processes multiple messages in the embedding isolate).
0.1.6 - 2026-02-10 #
Fixed #
- Confirmed that parsed tool calls from llama.cpp outputs always include non-null, non-empty
LLMToolCall.idvalues across supported formats (JSON, XML-style, and function-style), maintaining compatibility withllm_coretool-calling expectations. - Added tests for
ToolCallParserto verify that tool call IDs are populated correctly for downstreamtoolCallIdusage.
0.1.5 - 2026-01-26 #
Added #
- Support for
StreamChatOptionsinstreamChat()method - Support for
chatResponse()method for non-streaming complete responses - Input validation for model names and messages
- Improved isolate-based inference handling
Changed #
streamChat()now accepts optionalStreamChatOptionsparameter- Improved error handling
- Enhanced documentation
0.1.0 - 2026-01-19 #
Added #
- Initial release
- Local on-device inference with GGUF models via llama.cpp
- Cross-platform support: Android, iOS, macOS, Windows, Linux
- Streaming token generation with isolate-based inference
- Multiple prompt templates: ChatML, Llama2, Llama3, Alpaca, Vicuna, Phi-3
- Tool calling support via prompt convention
- GPU acceleration support (CUDA, Metal, Vulkan)
- Model management features:
- Model discovery in directories
- Model loading with pooling (reference counting)
- GGUF metadata reading without loading
- HuggingFace model downloading
- Safetensors to GGUF conversion
- Native Assets build hook for automatic binary management
- Prebuilt binaries available via GitHub Releases