mobile_rag_engine 0.3.2
mobile_rag_engine: ^0.3.2 copied to clipboard
A high-performance, on-device RAG (Retrieval-Augmented Generation) engine for Flutter. Run semantic search completely offline on iOS and Android with HNSW vector indexing.
Mobile RAG Engine #
A Flutter package for fully local RAG (Retrieval-Augmented Generation) on mobile devices.
Why I Built This #
Implementing AI-powered search on mobile typically requires a server. Embedding generation, vector storage, similarity search—all handled server-side, with the app just making API calls.
But this approach has problems:
- No internet, no functionality
- User data gets sent to servers
- Ongoing server costs
So I found a way to do everything on-device.
Technical Challenges #
I first tried pure Dart. Loading ONNX models, tokenizing, generating embeddings—it was too slow. Vector search became noticeably laggy with just 1,000 documents.
So I brought in Rust.
Rust + Flutter Architecture #
📖 Architecture Guide - Detailed system architecture documentation
Flutter (Dart)
│
├── EmbeddingService (ONNX Runtime)
│ └── text → 384-dim vector
│
└── flutter_rust_bridge (FFI)
│
▼
Rust
├── Tokenizer (HuggingFace tokenizers)
├── SQLite (vector storage)
└── HNSW Index (O(log n) search)
Rust's tokenizers crate is 10x+ faster than Dart for tokenization. Vector search improved from O(n) to O(log n) using the instant-distance HNSW implementation.
How It Differs #
vs. Server-based RAG #
- Works completely offline
- Data never leaves the device
- Zero network latency
vs. Pure Dart Implementation #
- Native Rust performance
- HNSW enables fast search even with large document sets
- Memory-efficient vector storage
vs. Existing Flutter Vector DBs #
- Direct ONNX model loading (no external APIs needed)
- Swappable models for Korean/multilingual support
- Integrated pipeline from embedding to search
Performance #
Tested on iOS Simulator (Apple Silicon Mac):
| Operation | Time |
|---|---|
| Tokenization (short text) | 0.8ms |
| Embedding generation (short text) | 4ms |
| Embedding generation (long text) | 36ms |
| HNSW search (100 docs) | 1ms |
With 1ms search on 100 documents, real-time search is feasible up to 10,000+ documents.
Problems Solved During Development #
1. iOS Cross-Compilation #
Initially, the onig regex library blocked iOS builds. ___chkstk_darwin symbol missing error. Switched to pure Rust fancy-regex to fix it.
2. HNSW Index Timing #
Rebuilding HNSW on every document insert results in O(n²) complexity. Changed to rebuild once after bulk inserts.
3. Duplicate Document Handling #
Identical documents caused duplicates in search results. Added SHA256 content hashing to skip already-stored documents.
4. ONNX Runtime Thread Safety #
Tried parallel batch embedding, but onnxruntime's OrtSession isn't thread-safe. Switched to sequential processing—still fast enough for real-world use since individual embeddings are quick.
Usage #
Installation #
Prerequisites: Rust Environment
This package uses Rust for high-performance tokenization and vector search. You need to install Rust to build locally. Once installed, cargokit handles the build automatically when you run flutter run.
macOS / Linux:
# Install Rust via rustup
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Windows:
# Download and run rustup-init.exe from https://rustup.rs
# Or use winget:
winget install Rustlang.Rustup
Required toolchains and targets (iOS/Android) are automatically installed by
cargokitduring the first build.
Note: After installing Rust, restart your terminal/IDE for the changes to take effect.
Flutter Dependency
dependencies:
mobile_rag_engine:
git:
url: https://github.com/dev07060/mobile_rag_engine.git
# Required for iOS static library loading
flutter_rust_bridge: ^2.0.0
iOS Setup #
For iOS, the Rust library is statically linked. Add this import and use ExternalLibrary.process():
import 'dart:io' show Platform;
import 'package:flutter_rust_bridge/flutter_rust_bridge_for_generated.dart';
Initialization #
import 'package:mobile_rag_engine/mobile_rag_engine.dart';
import 'package:flutter_rust_bridge/flutter_rust_bridge_for_generated.dart';
Future<void> initializeRag() async {
// Initialize Rust library (platform-specific)
if (Platform.isIOS || Platform.isMacOS) {
// iOS/macOS: static library loaded via DynamicLibrary.process()
await RustLib.init(
externalLibrary: ExternalLibrary.process(iKnowHowToUseIt: true),
);
} else {
// Android/Linux/Windows: dynamic library
await RustLib.init();
}
// Load tokenizer
await initTokenizer(tokenizerPath: 'path/to/tokenizer.json');
// Load ONNX model
final modelBytes = await rootBundle.load('assets/model.onnx');
await EmbeddingService.init(modelBytes.buffer.asUint8List());
// Initialize DB
await initDb(dbPath: 'path/to/rag.db');
}
Note: On iOS, if you encounter
symbol not founderrors, ensure your Podfile has been updated withpod installafter adding the dependency.
Adding Documents #
final text = "Flutter is a cross-platform UI framework.";
final embedding = await EmbeddingService.embed(text);
final result = await addDocument(
dbPath: dbPath,
content: text,
embedding: embedding,
);
if (result.isDuplicate) {
print("Document already exists");
}
// Rebuild index after bulk inserts
await rebuildHnswIndex(dbPath: dbPath);
Searching #
final query = "cross-platform development";
final queryEmbedding = await EmbeddingService.embed(query);
final results = await searchSimilar(
dbPath: dbPath,
queryEmbedding: queryEmbedding,
topK: 5,
);
for (final doc in results) {
print(doc);
}
Required Models #
You need a Sentence Transformer model in ONNX format.
pip install optimum[exporters]
optimum-cli export onnx \
--model sentence-transformers/all-MiniLM-L6-v2 \
./model_output
Add model_output/model.onnx and tokenizer.json to your app's assets.
Releases #
- v0.3.0 - Rust Semantic Chunking - Migrated to Unicode-based semantic chunking
- v0.2.0 - LLM-Optimized Chunking - Added chunking and context assembly
Future Plans #
- INT8 quantization to reduce model size
- Korean-specific models (KoSimCSE, KR-SBERT)
Chunking strategies for long documents✅ (v0.2.0, v0.3.0)- Hybrid search (keyword + semantic)
License #
MIT
Contributing #
Bug reports, feature requests, and PRs are all welcome.