LiteRtLmWebInferenceModel class
Web .litertlm inference via the upstream @litert-lm/core early-preview
JS API (LiteRT-LM v0.12.0+ on web through WebGPU/WASM).
Mirrors FfiInferenceModel (mobile/desktop) for the same C API but maps
it onto the JS surface: Engine.create → engine.createConversation →
conversation.sendMessageStreaming(text) returning a JS AsyncIterator.
Limitations (matches upstream early-preview status):
- Text-in/text-out only — vision/audio/thinking are warn-and-ignore.
- LoRA throws UnsupportedError (parity with FFI path).
stopGeneration()closes the local stream and calls the upstreamconversation.cancel()to abort the JS-side generation (wrapped in try/catch — the early-preview API may throw if nothing is in flight).- For models >2 GB use
WebStorageMode.streamingso the resolver returns an OpfsStreamModelSource — passing a Blob URL toEngine.createtrips Chrome'sERR_BLOB_OUT_OF_MEMORYlimit. The WebModelSourceResolver handles the routing transparently — same path MediaPipeWebInferenceModeluses today.
- Inheritance
-
- Object
- InferenceModel
- LiteRtLmWebInferenceModel
Constructors
- LiteRtLmWebInferenceModel({required WebModelSourceResolver sourceResolver, required int maxTokens, required ModelType modelType, ModelFileType fileType = ModelFileType.litertlm, int? maxConcurrentSessions, required VoidCallback onClose})
Properties
- activeBackend → PreferredBackend?
-
Backend that the runtime initialized for this model, when known.
no setteroverride
- chat ↔ InferenceChat?
-
getter/setter pairinherited
- fileType → ModelFileType
-
final
- generationMutex → Mutex
-
Serializes generation across all sessions on this model — concurrent
contexts, serialized inference, matching the FFI/MediaPipe paths. The
@litert-lm/coreengine is shared WebGPU/WASM state; parallel generations would contend for the accelerator. Passed to each session.final - hashCode → int
-
The hash code for this object.
no setterinherited
- maxConcurrentSessions → int?
-
Cap on concurrent openSession sessions; null = unlimited.
final
- maxTokens → int
-
final
- modelType → ModelType
-
final
- onClose → VoidCallback
-
final
- runtimeType → Type
-
A representation of the runtime type of the object.
no setterinherited
- session → InferenceModelSession?
-
The single session created via createSession. Singleton lane —
each createSession call overwrites this field with a new session
and closes the previous one.
no setteroverride
-
sessions
→ List<
InferenceModelSession> -
Live sessions owned by this model — union of the legacy session
(if any) and every active openSession result. Returns an
unmodifiable view; mutate via openSession,
session.close(), or close.no setteroverride - sourceResolver → WebModelSourceResolver
-
Shared with WebInferenceModel — resolves the active model into either
a BlobUrlModelSource (cacheApi/none) or OpfsStreamModelSource
(streaming). Engine-specific glue lives in
_ensureEnginebelow.final
Methods
-
close(
) → Future< void> -
override
-
createChat(
{double temperature = .8, int randomSeed = 1, int topK = 1, double? topP, int tokenBuffer = 256, String? loraPath, bool? supportImage, bool? supportAudio, List< Tool> tools = const [], bool? supportsFunctionCalls, bool isThinking = false, ModelType? modelType, ToolChoice toolChoice = ToolChoice.auto, int? maxFunctionBufferLength, String? systemInstruction}) → Future<InferenceChat> -
inherited
-
createSession(
{double temperature = .8, int randomSeed = 1, int topK = 1, double? topP, String? loraPath, bool? enableVisionModality, bool? enableAudioModality, String? systemInstruction, bool enableThinking = false, List< Tool> tools = const []}) → Future<InferenceModelSession> -
Creates a new InferenceModelSession for generation.
override
-
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
openChat(
{double temperature = .8, int randomSeed = 1, int topK = 1, double? topP, int tokenBuffer = 256, String? loraPath, bool? supportImage, bool? supportAudio, List< Tool> tools = const [], bool? supportsFunctionCalls, bool isThinking = false, ModelType? modelType, ToolChoice toolChoice = ToolChoice.auto, int? maxFunctionBufferLength, String? systemInstruction}) → Future<InferenceChat> -
Same as createChat, but uses openSession internally so the
resulting chat owns an independent session that does not touch the
legacy session field or other open chats. Use this when you need
concurrent chats on a single loaded model.
inherited
-
openSession(
{double temperature = .8, int randomSeed = 1, int topK = 1, double? topP, String? loraPath, bool? enableVisionModality, bool? enableAudioModality, String? systemInstruction, bool enableThinking = false, List< Tool> tools = const []}) → Future<InferenceModelSession> -
Opens a new session detached from session. Each call returns a
fresh independent session sharing the loaded model weights but with
isolated context (history / KV cache). Use this for concurrent
dialogues on a single loaded model.
override
-
toString(
) → String -
A string representation of this object.
inherited
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited