createSession abstract method
Creates a new InferenceModelSession for generation.
temperature, randomSeed, topK, topP — parameters for sampling.
loraPath — optional path to LoRA model.
enableVisionModality — enable vision modality for multimodal models.
enableAudioModality — enable audio modality for Gemma 3n E4B models.
maxOutputTokens — optional cap on how many tokens this session
generates per response. This is the GENERATION length, distinct from
the model's maxTokens (the whole CONTEXT WINDOW — input + output —
passed to createModel/getActiveModel). To make the model produce a
short reply, set maxOutputTokens (e.g. 100) and leave maxTokens at
the default; do NOT lower maxTokens to 100 — .litertlm models require
a context window of at least 1024 and will fail to allocate tensors
below it (#318). Currently honored on the .litertlm (FFI) path; the
MediaPipe .task path has no session-level output cap and ignores it
(a log line is emitted).
Implementation
Future<InferenceModelSession> createSession({
double temperature = .8,
int randomSeed = 1,
int topK = 1,
double? topP,
String? loraPath,
bool? enableVisionModality, // Add vision modality support
bool? enableAudioModality, // Add audio modality support (Gemma 3n E4B)
String? systemInstruction,
bool enableThinking =
false, // Enable thinking mode (Gemma 4 via extraContext)
List<Tool> tools =
const [], // Native tool calling (Gemma 4 → SDK tools_json)
int? maxOutputTokens, // Cap GENERATED tokens (not the context window)
});