LiteRtLmWebInferenceModel class

Web .litertlm inference via the upstream @litert-lm/core early-preview JS API (LiteRT-LM v0.12.0+ on web through WebGPU/WASM).

Mirrors FfiInferenceModel (mobile/desktop) for the same C API but maps it onto the JS surface: Engine.create → engine.createConversation → conversation.sendMessageStreaming(text) returning a JS AsyncIterator.

Limitations (matches upstream early-preview status):

Text-in/text-out only — vision/audio/thinking are warn-and-ignore.
LoRA throws UnsupportedError (parity with FFI path).
stopGeneration() closes the local stream and calls the upstream conversation.cancel() to abort the JS-side generation (wrapped in try/catch — the early-preview API may throw if nothing is in flight).
For models >2 GB use WebStorageMode.streaming so the resolver returns an OpfsStreamModelSource — passing a Blob URL to Engine.create trips Chrome's ERR_BLOB_OUT_OF_MEMORY limit. The WebModelSourceResolver handles the routing transparently — same path MediaPipe WebInferenceModel uses today.

Inheritance

Object
InferenceModel
LiteRtLmWebInferenceModel

Constructors

LiteRtLmWebInferenceModel({required WebModelSourceResolver sourceResolver, required int maxTokens, required ModelType modelType, ModelFileType fileType = ModelFileType.litertlm, int? maxConcurrentSessions, required VoidCallback onClose})

Properties

activeBackend → PreferredBackend?: Backend that the runtime initialized for this model, when known.
no setteroverride
chat ↔ InferenceChat?: getter/setter pairinherited
fileType → ModelFileType: final
generationMutex → Mutex: Serializes generation across all sessions on this model — concurrent contexts, serialized inference, matching the FFI/MediaPipe paths. The @litert-lm/core engine is shared WebGPU/WASM state; parallel generations would contend for the accelerator. Passed to each session.
final
hashCode → int: The hash code for this object.
no setterinherited
maxConcurrentSessions → int?: Cap on concurrent openSession sessions; null = unlimited.
final
maxTokens → int: final
modelType → ModelType: final
onClose → VoidCallback: final
runtimeType → Type: A representation of the runtime type of the object.
no setterinherited
session → InferenceModelSession?: The single session created via createSession. Singleton lane — each createSession call overwrites this field with a new session and closes the previous one.
no setteroverride
sessions → List<InferenceModelSession>: Live sessions owned by this model — union of the legacy session (if any) and every active openSession result. Returns an unmodifiable view; mutate via openSession, session.close(), or close.
no setteroverride
sourceResolver → WebModelSourceResolver: Shared with WebInferenceModel — resolves the active model into either a BlobUrlModelSource (cacheApi/none) or OpfsStreamModelSource (streaming). Engine-specific glue lives in _ensureEngine below.
final

Methods

close() → Future<void>: override
createChat({double temperature = .8, int randomSeed = 1, int topK = 1, double? topP, int tokenBuffer = 256, String? loraPath, bool? supportImage, bool? supportAudio, List<Tool> tools = const [], bool? supportsFunctionCalls, bool isThinking = false, ModelType? modelType, ToolChoice toolChoice = ToolChoice.auto, int? maxFunctionBufferLength, String? systemInstruction}) → Future<InferenceChat>: inherited
createSession({double temperature = .8, int randomSeed = 1, int topK = 1, double? topP, String? loraPath, bool? enableVisionModality, bool? enableAudioModality, String? systemInstruction, bool enableThinking = false, List<Tool> tools = const []}) → Future<InferenceModelSession>: Creates a new InferenceModelSession for generation.
override
noSuchMethod(Invocation invocation) → dynamic: Invoked when a nonexistent method or property is accessed.
inherited
openChat({double temperature = .8, int randomSeed = 1, int topK = 1, double? topP, int tokenBuffer = 256, String? loraPath, bool? supportImage, bool? supportAudio, List<Tool> tools = const [], bool? supportsFunctionCalls, bool isThinking = false, ModelType? modelType, ToolChoice toolChoice = ToolChoice.auto, int? maxFunctionBufferLength, String? systemInstruction}) → Future<InferenceChat>: Same as createChat, but uses openSession internally so the resulting chat owns an independent session that does not touch the legacy session field or other open chats. Use this when you need concurrent chats on a single loaded model.
inherited
openSession({double temperature = .8, int randomSeed = 1, int topK = 1, double? topP, String? loraPath, bool? enableVisionModality, bool? enableAudioModality, String? systemInstruction, bool enableThinking = false, List<Tool> tools = const []}) → Future<InferenceModelSession>: Opens a new session detached from session. Each call returns a fresh independent session sharing the loaded model weights but with isolated context (history / KV cache). Use this for concurrent dialogues on a single loaded model.
override
toString() → String: A string representation of this object.
inherited

Operators

operator ==(Object other) → bool: The equality operator.
inherited

LiteRtLmWebInferenceModel class

Constructors

Properties

Methods

Operators

flutter_gemma_web library