juice_llm 0.2.0
juice_llm: ^0.2.0 copied to clipboard
On-device LLM inference as a Juice bloc — model lifecycle, streaming generation, and embeddings behind a swappable runtime seam.
Changelog #
0.2.0 #
Added (multimodal input — non-breaking) #
LlmCapability.audio— alongside the existingvision, for models that take audio clips (voice memos) as input.LlmMessage.audio—List<Uint8List>of encoded audio (WAV/MP3/FLAC), mirroring the existingimages. A multimodal provider feeds them to the model; text-only models ignore them.LlmLoadOptions.projectorPath— optional path to a multimodal projector (mmproj GGUF). When set, a multimodal-capable provider loads it and enables image/audio input. The projector pairs with the weights; the app acquires it and passes the resolved path (app-orchestrated provisioning — the acquisition seam stays single-artifact for now).
All additive — existing text-only usage is unchanged.
0.1.1 #
Fixed #
FetchModelEventcrashed withtype 'int' is not a subtype of 'double?'.FetchModelUseCaseemittedfetchProgress: 0(int) andLlmState.copyWithcast itas double?. Now emits0.0andcopyWithcoerces vianum? .toDouble(). The whole fetch lifecycle was untested (Echo/Fake providers skip it); a fetch-lifecycle regression test now covers it. Surfaced by the Glean dogfood (first realModelSource).
0.1.0 #
Initial release — Reviewed.
LlmBloc: on-device LLM inference as a bloc — model-lifecycle state machine (absent → fetching → fetched → loading → ready / error) plus streaming generation and embedding sessions.- Seams:
LlmProvider(runtime) andModelSource(weight acquisition + checksum verify), following theAuthProvider/FlagsSourcepattern. EchoLlmProvider: pure-Dart, zero-dependency reference runtime — the runnable default (streams a reflective reply word-by-word; deterministic embeddings).- Per-request rebuild groups (
LlmGroups.gen(id)) with throttled streaming emissions (coalesced to ≤ one perstreamThrottle, terminal always flushed). - Concurrency:
GenerateEventsequential(one runtime context),CancelGenerationEventconcurrent(out-of-band stop); one terminal finalize point so the queue never wedges on cancel. - Fail-loud: no-model generate fails its session; load failure surfaces with no silent fallback model; checksum mismatch deletes + throws; embeddings capability guard; no load/unload under an active generation.
- Bounded session retention (
maxRetainedSessions) + explicitevictSession. - Example app: Echo runtime by default, with
OllamaLlmProvider(real local model over HTTP) as the seam-swap reference.