onde_inference library
On-device LLM inference SDK for Flutter & Dart.
Runs Qwen models locally with Metal on Apple platforms and CPU inference on Android, Linux, and Windows. No cloud hop and no user data leaving the device. Powered by the Onde Rust engine and mistral.rs.
Quick start
import 'package:flutter/widgets.dart';
import 'package:onde_inference/onde_inference.dart';
Future<void> main() async {
WidgetsFlutterBinding.ensureInitialized();
await OndeInference.init();
final engine = OndeChatEngine();
await engine.loadDefaultModel(
systemPrompt: 'You are a helpful assistant.',
);
final result = await engine.sendMessage(message: 'Hello!');
print(result.text);
final buffer = StringBuffer();
await for (final chunk in engine.streamMessage(message: 'Tell me a short story.')) {
buffer.write(chunk.delta);
if (chunk.done) break;
}
print(buffer.toString());
await engine.unloadModel();
}
Selecting a model
final config = OndeInference.defaultModelConfig();
final coderConfig = OndeInference.qwen25Coder3bConfig();
await engine.loadGgufModel(config: coderConfig);
Customising sampling
await engine.setSampling(
sampling: OndeInference.deterministicSamplingConfig(),
);
await engine.loadDefaultModel(
sampling: SamplingConfig(
temperature: 0.5,
maxTokens: BigInt.from(256),
),
);
Error handling
The generated bridge throws OndeError values directly:
try {
await engine.sendMessage(message: '...');
} on OndeError catch (e) {
debugPrint('Inference error: $e');
}
See the package README and the example app for platform-specific setup, cache configuration, and end-to-end Flutter integration.
Classes
- ChatMessage
- A single message in a conversation.
- EngineInfo
- A point-in-time snapshot of the engine's state.
- GgufModelConfig
- Configuration for loading a pre-quantised GGUF model.
- InferenceResult
- The result of a completed inference request.
- OndeChatEngine
- OndeInference
- Static helper namespace for Onde SDK initialisation and configuration.
- RustLib
- Main entrypoint of the Rust API
- SamplingConfig
-
Sampling parameters for text generation. All fields are optional —
Nonemeans "use the engine default". - StreamChunk
- A single streaming token chunk emitted during inference.
- ToolCallInfo
- Structured tool-call request emitted by the model.
Enums
- ChatRole
- Role of a participant in a chat conversation.
- EngineStatus
- Lifecycle status of the inference engine.
Extensions
- ChatMessageX on ChatMessage
- Convenience factory constructors for ChatMessage.
- EngineInfoX on EngineInfo
- Convenience helpers on EngineInfo.
- EngineStatusX on EngineStatus
- Convenience helpers on EngineStatus.
- InferenceResultToolsX on InferenceResult
- Convenience helpers on InferenceResult for tool calling.
- InferenceResultX on InferenceResult
- Convenience helpers on InferenceResult.
- OndeChatEngineX on OndeChatEngine
- High-level wrapper around the FRB-generated api.OndeChatEngine opaque type.
- SamplingConfigX on SamplingConfig
- Sampling configuration presets for common use-cases.
Functions
-
configureCacheDir(
{required String appDataDir}) → void - Seed the HuggingFace cache environment for sandboxed platforms.
Exceptions / Errors
- OndeError
- OndeException
- Optional Dart Exception wrapper around an OndeError.