llamadart 0.5.2
llamadart: ^0.5.2 copied to clipboard
A Dart/Flutter plugin for llama.cpp - run LLM inference on any platform using GGUF models
example/README.md
llamadart Examples #
This directory contains example applications demonstrating how to use the llamadart package.
Available Examples #
1. Basic App (basic_app/) #
A simple console application showing:
- Model loading
- Context creation
- Tokenization
- Text generation
- Resource cleanup
Best for: Understanding the core API
Run:
cd basic_app
dart pub get
dart run
2. Chat App (chat_app/) #
A Flutter UI application showing:
- Real-time chat interface
- Model configuration
- Settings persistence
- Streaming text generation
- Material Design UI
Best for: Real-world Flutter integration
Run:
cd chat_app
flutter pub get
flutter run
Testing #
basic_app(Dart console):
cd basic_app
dart test
chat_app(Flutter UI):
cd chat_app
flutter test
Note: chat_app uses Flutter libraries (dart:ui), so dart test is not
the correct runner for that example.
Quick Start #
- Choose an example: Basic (console) or Chat (Flutter)
- Download a model (see each example's README)
- Run the example: Follow instructions in each subdirectory
Testing pub.dev Package #
These examples simulate how users will use llamadart when published to pub.dev:
- They add llamadart as a dependency
- They rely on automatic library download
- They don't need to run build scripts
Common Models for Testing #
- TinyLlama (1.1B, ~638MB) - Fast, good for testing
- Llama 2 (7B, ~4GB) - More powerful, slower
- Mistral (7B, ~4GB) - Great performance
See HuggingFace for more: https://huggingface.co/models?search=gguf
Model Formats #
llamadart supports GGUF format models (converted for llama.cpp).
Architecture #
example/
├── basic_app/ # Console application
│ ├── lib/ # Dart code
│ ├── pubspec.yaml # Dependencies
│ └── README.md # Instructions
└── chat_app/ # Flutter application
├── lib/ # Flutter code
├── android/ # Android config
├── ios/ # iOS config
├── pubspec.yaml # Dependencies
└── README.md # Instructions
Need Help? #
- Check individual example README files
- Report issues: https://github.com/leehack/llamadart/issues
- Docs: https://github.com/leehack/llamadart
Requirements #
- Dart SDK 3.10.7 or higher
- For chat_app: Flutter 3.38.0 or higher
- Internet connection (for first run - downloads native libraries)
- At least 2GB RAM minimum, 4GB+ recommended
Platform Compatibility #
| Platform | Architecture(s) | GPU Backend | Status |
|---|---|---|---|
| macOS | arm64, x86_64 | Metal | ✅ Tested |
| iOS | arm64 (Device), arm64/x86_64 (Sim) | Metal (Device), CPU (Sim) | ✅ Tested |
| Android | arm64-v8a, x86_64 | Vulkan | ✅ Tested |
| Linux | arm64, x86_64 | Vulkan | 🟡 Expected (Vulkan Untested) |
| Windows | x64 | Vulkan | ✅ Tested |
| Web | WASM / WebGPU Bridge | CPU / Experimental WebGPU | ✅ Tested |
Web Notes #
- Web examples run on the llama.cpp bridge backend (WebGPU or CPU mode).
chat_apploader is local-first and falls back to jsDelivr bridge assets.- You can prefetch a pinned bridge version into
web/webgpu_bridge/with:WEBGPU_BRIDGE_ASSETS_TAG=<tag> ./scripts/fetch_webgpu_bridge_assets.sh. - Fetch script defaults to universal Safari-compatible patching:
WEBGPU_BRIDGE_PATCH_SAFARI_COMPAT=1andWEBGPU_BRIDGE_MIN_SAFARI_VERSION=170400. chat_app/web/index.htmlalso applies Safari compatibility patching at runtime before bridge initialization (including CDN fallback).- Web model loading uses browser Cache Storage by default, so repeated loads of the same model URL can skip full re-download.
- Safari WebGPU uses a compatibility gate in
llamadart: legacy bridge assets default to CPU fallback, while adaptive bridge assets can probe/cap GPU layers and auto-fallback to CPU when unstable. - You can still bypass the legacy safeguard with
window.__llamadartAllowSafariWebGpu = truebefore model load. - Multimodal projector loading works on web via URL-based model/mmproj pairs.
- In
chat_app, image/audio attachments on web are sent as browser file bytes; local file paths are native-only. - Native LoRA runtime adapter flows are not available on web.
chat_appon web uses model URLs rather than native file download storage.
Troubleshooting #
Common Issues:
-
Failed to load library:
- Check console for download messages
- Ensure internet connection for first run
- Verify GitHub releases are accessible
-
Model file not found:
- Download a model to the default location
- Or set LLAMA_MODEL_PATH environment variable
- Or configure in app settings (chat_app)
-
Slow performance:
- Use smaller quantization (Q4_K_M recommended)
- Reduce context size (nCtx parameter)
- Enable GPU layers if available
-
Flutter build errors:
- Ensure Flutter SDK is properly installed
- Run
flutter doctorto check setup - Reinstall dependencies with
flutter clean && flutter pub get
Security Notes #
- Models downloaded from the internet should be from trusted sources
- Never share private/sensitive data with open-source models
- The app runs locally - no data is sent to external servers (except library download on first run)
Contributing #
To contribute new examples:
- Create a new subdirectory in
example/ - Add a pubspec.yaml with llamadart as dependency
- Include a README.md with setup instructions
- Test on multiple platforms if possible
- Add integration test to runner.dart if applicable
License #
These examples are part of the llamadart project and follow the same license.