createSession method - InferenceModel class - flutter_gemma_interface library

Creates a new InferenceModelSession for generation.

temperature, randomSeed, topK, topP — parameters for sampling. loraPath — optional path to LoRA model. enableVisionModality — enable vision modality for multimodal models. enableAudioModality — enable audio modality for Gemma 3n E4B models.

maxOutputTokens — optional cap on how many tokens this session generates per response. This is the GENERATION length, distinct from the model's maxTokens (the whole CONTEXT WINDOW — input + output — passed to createModel/getActiveModel). To make the model produce a short reply, set maxOutputTokens (e.g. 100) and leave maxTokens at the default; do NOT lower maxTokens to 100 — .litertlm models require a context window of at least 1024 and will fail to allocate tensors below it (#318). Currently honored on the .litertlm (FFI) path; the MediaPipe .task path has no session-level output cap and ignores it (a log line is emitted).

Implementation

Future<InferenceModelSession> createSession({ double temperature = .8, int randomSeed = 1, int topK = 1, double? topP, String? loraPath, bool? enableVisionModality, // Add vision modality support bool? enableAudioModality, // Add audio modality support (Gemma 3n E4B) String? systemInstruction, bool enableThinking = false, // Enable thinking mode (Gemma 4 via extraContext) List<Tool> tools = const [], // Native tool calling (Gemma 4 → SDK tools_json) int? maxOutputTokens, // Cap GENERATED tokens (not the context window) });

createSession abstract method

Implementation

InferenceModel class