flutter_litert_lm 0.3.0
flutter_litert_lm: ^0.3.0 copied to clipboard
Flutter plugin for Google's LiteRT-LM — run Large Language Models on-device with CPU, GPU (OpenCL) or NPU acceleration. Supports Gemma, Qwen, Phi, DeepSeek and more.
Changelog #
All notable changes to flutter_litert_lm are documented in this file. The
format follows Keep a Changelog and
this project adheres to Semantic Versioning.
[Unreleased] #
0.3.0 — 2026-04-11 #
Changed #
- Bumped LiteRT-LM runtime to upstream
176953b(2026-04-11), up from the12db62asnapshot v0.2.0 shipped. The new runtime understands themulti-prefill-seq_q8_ekv4096packaging format that HuggingFacelitert-communitynow uses for every non-Gemma model (Qwen 2.5 / Qwen 3 / Phi-4 mini / DeepSeek R1 distill / etc). Under v0.2.0, engine_create() would either fail outright on those files or load the weights but emit garbage during decoding — the runtime was missing the flatbuffer schema entries for the extended prefill descriptor. v0.3.0 handles them transparently; the public C API surface is unchanged.
Added #
- Verified loads on HuggingFace
litert-community/Qwen2.5-1.5B-Instruct,litert-community/Qwen3-0.6B,litert-community/Phi-4-mini-instruct, andlitert-community/DeepSeek-R1-Distill-Qwen-1.5Bin addition to the existing Gemma 4 E2B / E4B / Gemma 3 1B support.
Rebuild required #
- Every developer needs to re-run
scripts/build_ios_frameworks.shonce because v0.3.0 ships a newLiteRTLM.xcframeworkbuilt from the newer upstream source. Old cached artifacts are not compatible.
0.2.0 — 2026-04-11 #
Added #
- iOS support (beta) — the plugin now runs the LiteRT-LM C++ runtime
natively on iOS via a vendored
LiteRTLM.xcframeworkbuilt from upstream source. Previously iOS method-channel calls returnedUNSUPPORTED. scripts/build_ios_frameworks.sh— one-shot helper that clones upstream LiteRT-LM, cross-compileslibc_engine.aformacos_arm64,ios_arm64, andios_sim_arm64via Bazel, wraps the outputs into proper static and dynamic framework bundles, assembles two XCFrameworks, and drops them intoios/Frameworks/. First run is 30-60 minutes; subsequent runs are cached by Bazel.ios/Classes/LiteLmNativeBridge.{h,mm}— Objective-C++ bridge that wraps the LiteRT-LM C API (engine.h) for Swift. Handles engine / conversation lifecycle, synchronous and streaming message delivery, and captures native stderr so errors from the runtime surface asNSErrormessages instead of opaqueNULLreturns.- Example app's
_BackendSelectornow shows a read-only CPU chip on iOS with a note explaining the Metal accelerator is not available yet, instead of GPU/NPU buttons that would always fail at load time. - README documents the iOS setup flow, the Bazel build prerequisites,
and the iOS-specific troubleshooting entries (
NOT_FOUND: Engine type not found,UNIMPLEMENTED: Sampler type, etc).
Fixed #
- Podspec now passes
-all_loadto the pod target'sOTHER_LDFLAGSso the linker keeps every object file fromlibc_engine.a, including the engine-factory static constructors. Without this,engine_createfailed at runtime withNOT_FOUND: Engine type not found: 1. - Podspec and example
Podfileexcludex86_64from the simulator architecture so the Apple Silicon-only XCFrameworks link cleanly on modern Macs.
Known iOS limitations #
- CPU (XNNPACK) backend only. The LiteRT-LM Metal GPU and WebGPU accelerators exist as prebuilt dylibs for macOS but have not been released for iOS yet — tracked upstream in LiteRT-LM#1050.
- Sampler knobs are ignored. The iOS C API currently reports
UNIMPLEMENTED: Sampler type: N not implemented yetfor both kTopK (1) and kGreedy (3), so the plugin passes aNULLsession config and lets the engine use whatever sampler is baked into the model's own metadata. - XCFrameworks only contain
arm64slices — no Intel simulator support. - Xcframeworks are not distributed with the package (they're ~370 MB on
disk). Every developer must run
scripts/build_ios_frameworks.shlocally before their first iOS build.
0.1.0 — 2026-04-10 #
Initial public release.
Added #
- Android support targeting
com.google.ai.edge.litertlm:litertlm-android:0.10.0. LiteLmEnginefor loading.litertlmmodel files with CPU, GPU (OpenCL), or NPU backends.LiteLmConversationfor multi-turn chat with system instructions, sampler config, optional tools, and initial-message seeding.sendMessagefor full responses andsendMessageStreamfor token-by-token streaming.- Multimodal
sendMultimodalMessagefor text + image + audio inputs. - Tool / function calling via
LiteLmToolandsendToolResponse. - Consumer Proguard / R8 keep rules shipped with the AAR so release builds don't strip the LiteRT-LM JNI surface.
<uses-native-library>manifest entries merged into the host app forlibOpenCL.so(Qualcomm Adreno),libOpenCL-pixel.so(Pixel Tensor), andlibOpenCL-car.so(Android Auto) so the GPU backend works on Android 12+.- Example app with model picker, on-device download with progress, backend selector, streaming chat, and per-response inference stats (tokens, tok/s, time-to-first-token, total duration).
Known limitations #
- iOS bridge is currently a stub — Google's LiteRT-LM Swift SDK is still in
development. Method-channel calls return
UNSUPPORTEDuntil upstream ships. countTokensreturns-1because token counting is not yet exposed by the upstream public API.- Streaming downloads do not yet resume on network interruption — they restart from byte zero on retry.