createSession abstract method

Future<InferenceModelSession> createSession({
  1. double temperature = .8,
  2. int randomSeed = 1,
  3. int topK = 1,
  4. double? topP,
  5. String? loraPath,
  6. bool? enableVisionModality,
  7. bool? enableAudioModality,
  8. String? systemInstruction,
  9. bool enableThinking = false,
  10. List<Tool> tools = const [],
  11. int? maxOutputTokens,
})

Creates a new InferenceModelSession for generation.

temperature, randomSeed, topK, topP — parameters for sampling. loraPath — optional path to LoRA model. enableVisionModality — enable vision modality for multimodal models. enableAudioModality — enable audio modality for Gemma 3n E4B models.

maxOutputTokens — optional cap on how many tokens this session generates per response. This is the GENERATION length, distinct from the model's maxTokens (the whole CONTEXT WINDOW — input + output — passed to createModel/getActiveModel). To make the model produce a short reply, set maxOutputTokens (e.g. 100) and leave maxTokens at the default; do NOT lower maxTokens to 100 — .litertlm models require a context window of at least 1024 and will fail to allocate tensors below it (#318). Currently honored on the .litertlm (FFI) path; the MediaPipe .task path has no session-level output cap and ignores it (a log line is emitted).

Implementation

Future<InferenceModelSession> createSession({
  double temperature = .8,
  int randomSeed = 1,
  int topK = 1,
  double? topP,
  String? loraPath,
  bool? enableVisionModality, // Add vision modality support
  bool? enableAudioModality, // Add audio modality support (Gemma 3n E4B)
  String? systemInstruction,
  bool enableThinking =
      false, // Enable thinking mode (Gemma 4 via extraContext)
  List<Tool> tools =
      const [], // Native tool calling (Gemma 4 → SDK tools_json)
  int? maxOutputTokens, // Cap GENERATED tokens (not the context window)
});