createModel abstract method

Future<InferenceModel> createModel({
  1. required ModelType modelType,
  2. ModelFileType fileType = ModelFileType.task,
  3. int maxTokens = 1024,
  4. PreferredBackend? preferredBackend,
  5. List<int>? loraRanks,
  6. int? maxNumImages,
  7. bool supportImage = false,
  8. bool supportAudio = false,
  9. bool? enableSpeculativeDecoding,
  10. int? maxConcurrentSessions,
})

Creates and returns a new InferenceModel instance.

modelType — model type to create. maxTokens — the model's CONTEXT WINDOW: the total number of tokens shared by input (system prompt + history + current message) AND the generated output, i.e. the KV-cache budget. It is NOT the maximum response length — to cap how much is generated, use maxOutputTokens on InferenceModel.createSession. .litertlm models require a context window of at least 1024 (their baked kv_cache_max_len); a smaller value is clamped up to 1024 to avoid a native tensor-allocation crash (#318). The default (1024) is safe for every supported model. preferredBackend — backend preference (e.g., CPU, GPU). loraRanks — optional supported LoRA ranks. maxNumImages — maximum number of images (for multimodal models). supportImage — whether the model supports images. supportAudio — whether the model supports audio (Gemma 3n E4B only). enableSpeculativeDecoding — Multi-Token Prediction toggle for Gemma 4 E2B/E4B (LiteRT-LM v0.11.0+). null honors the model's default; true/false forces on/off. Older .litertlm files without an MTP drafter ignore this flag at the SDK level. maxConcurrentSessions — optional cap on the number of sessions open at once via InferenceModel.openSession. null (default) = no cap, backward-compatible. When set, the (cap+1)-th InferenceModel.openSession throws StateError. Use this on mobile with large models to guard against OOM from multiple concurrent KV caches.

Implementation

Future<InferenceModel> createModel({
  required ModelType modelType,
  ModelFileType fileType = ModelFileType.task,
  int maxTokens = 1024,
  PreferredBackend? preferredBackend,
  List<int>? loraRanks,
  int? maxNumImages, // Add image support
  bool supportImage = false, // Add image support flag
  bool supportAudio = false, // Add audio support flag (Gemma 3n E4B)
  bool? enableSpeculativeDecoding,
  int? maxConcurrentSessions,
});