mt_llmkit 0.0.1-beta.1
mt_llmkit: ^0.0.1-beta.1 copied to clipboard
A Flutter plugin for running Large Language Models locally on Android and iOS via llama.cpp, with real-time streaming inference and performance metrics.
0.0.1-beta.1 #
Initial beta release of mt_llmkit.
Features #
- Local GGUF inference — run quantized LLMs entirely on-device via llamadart, with no internet connection required
- Two execution backends —
ModelBackend.isolate(default, Dart Isolate, no UI jank) andModelBackend.inProcess(lighter startup, supportsclean()) - Three generation methods on
LocalModel/LlmInterface:sendPrompt— raw token streamsendPromptComplete— full response as a singleStringsendPromptStream— token stream with livePerformanceMetrics(recommended)
- Vision / multimodal — supports LLaVA, Gemma 3, Qwen VL, SmolVLM and any
libmtmd-compatible model viaLlmConfig.mmprojPathandLlamaImageContent - Performance metrics —
PerformanceMetricswithtokensGenerated,durationMs,tokensPerSecond,msPerTokenupdated on everyStreamingChunk - Cloud AI chat providers — unified
AIChatProviderinterface with implementations for OpenAI, Google Gemini, Anthropic Claude, and Mistral AI; automatic retry with exponential back-off - Local RAG pipeline — fully on-device
RagEnginewith document chunking, embedding (via a separate CPU isolate), cosine-similarity vector search, and optional index persistence