onde_inference library

On-device LLM inference SDK for Flutter & Dart.

Runs Qwen models locally with Metal on Apple platforms and CPU inference on Android, Linux, and Windows. No cloud hop and no user data leaving the device. Powered by the Onde Rust engine and mistral.rs.

Quick start

import 'package:flutter/widgets.dart';
import 'package:onde_inference/onde_inference.dart';

Future<void> main() async {
  WidgetsFlutterBinding.ensureInitialized();
  await OndeInference.init();

  final engine = OndeChatEngine();
  await engine.loadDefaultModel(
    systemPrompt: 'You are a helpful assistant.',
  );

  final result = await engine.sendMessage(message: 'Hello!');
  print(result.text);

  final buffer = StringBuffer();
  await for (final chunk in engine.streamMessage(message: 'Tell me a short story.')) {
    buffer.write(chunk.delta);
    if (chunk.done) break;
  }
  print(buffer.toString());

  await engine.unloadModel();
}

Selecting a model

final config = OndeInference.defaultModelConfig();
final coderConfig = OndeInference.qwen25Coder3bConfig();

await engine.loadGgufModel(config: coderConfig);

Customising sampling

await engine.setSampling(
  sampling: OndeInference.deterministicSamplingConfig(),
);

await engine.loadDefaultModel(
  sampling: SamplingConfig(
    temperature: 0.5,
    maxTokens: BigInt.from(256),
  ),
);

Error handling

The generated bridge throws OndeError values directly:

try {
  await engine.sendMessage(message: '...');
} on OndeError catch (e) {
  debugPrint('Inference error: $e');
}

See the package README and the example app for platform-specific setup, cache configuration, and end-to-end Flutter integration.

Classes

ChatMessage
A single message in a conversation.
EngineInfo
A point-in-time snapshot of the engine's state.
GgufModelConfig
Configuration for loading a pre-quantised GGUF model.
InferenceResult
The result of a completed inference request.
OndeChatEngine
OndeInference
Static helper namespace for Onde SDK initialisation and configuration.
RustLib
Main entrypoint of the Rust API
SamplingConfig
Sampling parameters for text generation. All fields are optional — None means "use the engine default".
StreamChunk
A single streaming token chunk emitted during inference.
ToolCallInfo
Structured tool-call request emitted by the model.

Enums

ChatRole
Role of a participant in a chat conversation.
EngineStatus
Lifecycle status of the inference engine.

Extensions

ChatMessageX on ChatMessage
Convenience factory constructors for ChatMessage.
EngineInfoX on EngineInfo
Convenience helpers on EngineInfo.
EngineStatusX on EngineStatus
Convenience helpers on EngineStatus.
InferenceResultToolsX on InferenceResult
Convenience helpers on InferenceResult for tool calling.
InferenceResultX on InferenceResult
Convenience helpers on InferenceResult.
OndeChatEngineX on OndeChatEngine
High-level wrapper around the FRB-generated api.OndeChatEngine opaque type.
SamplingConfigX on SamplingConfig
Sampling configuration presets for common use-cases.

Functions

configureCacheDir({required String appDataDir}) → void
Seed the HuggingFace cache environment for sandboxed platforms.

Exceptions / Errors

OndeError
OndeException
Optional Dart Exception wrapper around an OndeError.