onde_inference library

On-device LLM inference SDK for Flutter & Dart.

Runs Qwen models locally with Metal on Apple platforms and CPU inference on Android, Linux, and Windows. No cloud hop and no user data leaving the device. Powered by the Onde Rust engine and mistral.rs.

Quick start

import 'package:flutter/widgets.dart';
import 'package:onde_inference/onde_inference.dart';

Future<void> main() async {
  WidgetsFlutterBinding.ensureInitialized();
  await OndeInference.init();

  final engine = OndeChatEngine();
  await engine.loadDefaultModel(
    systemPrompt: 'You are a helpful assistant.',
  );

  final result = await engine.sendMessage(message: 'Hello!');
  print(result.text);

  final buffer = StringBuffer();
  await for (final chunk in engine.streamMessage(message: 'Tell me a short story.')) {
    buffer.write(chunk.delta);
    if (chunk.done) break;
  }
  print(buffer.toString());

  await engine.unloadModel();
}

Selecting a model

final config = OndeInference.defaultModelConfig();
final coderConfig = OndeInference.qwen25Coder3bConfig();

await engine.loadGgufModel(config: coderConfig);

Customising sampling

await engine.setSampling(
  sampling: OndeInference.deterministicSamplingConfig(),
);

await engine.loadDefaultModel(
  sampling: SamplingConfig(
    temperature: 0.5,
    maxTokens: BigInt.from(256),
  ),
);

Error handling

The generated bridge throws OndeError values directly:

try {
  await engine.sendMessage(message: '...');
} on OndeError catch (e) {
  debugPrint('Inference error: $e');
}

See the package README and the example app for platform-specific setup, cache configuration, and end-to-end Flutter integration.

Classes

ChatMessage

A single message in a conversation.

EngineInfo

A point-in-time snapshot of the engine's state.

GgufModelConfig

Configuration for loading a pre-quantised GGUF model.

InferenceResult

The result of a completed inference request.

OndeChatEngine

OndeInference

Static helper namespace for Onde SDK initialisation and configuration.

RustLib

Main entrypoint of the Rust API

SamplingConfig

Sampling parameters for text generation. All fields are optional — None means "use the engine default".

StreamChunk

A single streaming token chunk emitted during inference.

ToolCallInfo

Structured tool-call request emitted by the model.

Extensions

onde_inference library

Quick start

Selecting a model

Customising sampling

Error handling

Classes

Enums

Extensions

Functions

Exceptions / Errors

onde_inference package

onde_inference library