mobile_rag_engine 0.18.2
mobile_rag_engine: ^0.18.2 copied to clipboard
A high-performance, on-device RAG (Retrieval-Augmented Generation) engine for Flutter. Run semantic search completely offline on iOS and Android with HNSW vector indexing.
Mobile RAG Engine #
Production-ready, fully local RAG (Retrieval-Augmented Generation) engine for Flutter.
Powered by a Rust core, it delivers lightning-fast vector search and embedding generation directly on the device. No servers, no API costs, no latency.
Memory note: The embedding vector path is copy-minimized in 0.18.0 with
Float32Listand isolate transfer optimizations, and the Rust core avoids unnecessary copies. The ingest text pipeline now keeps chunk content resident in Rust between chunking and DB commit — oneaddDocument(content)call sends the document body across FFI ~2× document size (down from ~4× under the pre-IngestSession chain). Verified byBenchmarkService.benchmarkIngestFfiTrafficandtest/native/ingest_ffi_traffic_test.dart: 256 KB doc → legacy 1019 KB / IngestSession 510 KB / 50% reduction.
addDocumentUtf8(bytes)andaddDocumentFromFile(path)are now true Rust-side bytes pass-through entrypoints (Rust functionsprepareSourceIngestionFromUtf8/prepareSourceIngestionFromFile): UTF-8 bytes go straight into Rust without an intermediate DartString, and the file variant reads the body inside Rust so it never crosses FFI at all. PublicaddDocument(String),addDocumentUtf8(bytes), andaddDocumentFromFile(path)signatures are unchanged.Measurements locked in by
test/native/ingest_ffi_traffic_test.dart(verified on macOS dev host, ASCII bench corpus):
Metric addDocument(String)addDocumentUtf8(bytes)addDocumentFromFile(path)FFI body bytes (256 KB doc) 256.1 KB 256.1 KB 0.0 KB Peak RSS Δ (4 MB doc, full-process) ~97 MB ~17 MB ~5 MB Wall-clock p50 (1 MB doc, stub embeddings) 173.7 ms 166.2 ms 162.4 ms The String path holds the full body in Dart memory plus chunker / staging intermediates before crossing FFI; the file path lets Rust read and chunk the body without ever materializing it in Dart, so peak RSS stays roughly an order of magnitude lower. Wall-clock gains are modest with stub embeddings (ONNX dominates real-world latency), but the heap and FFI-traffic gains compound on memory-constrained mobile devices.
Why this package? #
No Rust Installation Required #
You do NOT need to install Rust, Cargo, or Android NDK.
This package includes pre-compiled binaries for iOS, Android, and macOS. Just pub add and run.
Performance #
| Feature | Pure Dart | Mobile RAG Engine (Rust) |
|---|---|---|
| Tokenization | Slow | 10x Faster (HuggingFace tokenizers) |
| Vector Search | O(n) | O(log n) (HNSW Index) |
| Memory Usage | High | Optimized (copy-minimized Rust core) |
100% Offline & Private #
Data never leaves the user's device. Perfect for privacy-focused apps (journals, secure chats, enterprise tools).
Features #
End-to-End RAG Pipeline #
One package, complete pipeline. From any document format to LLM-ready context.
Key Features #
| Category | Features |
|---|---|
| Document Input | PDF, DOCX, Markdown, Plain Text with smart dehyphenation; file-path and UTF-8 ingest fast paths |
| Chunking | Plain-text paragraph/line chunking with heading-aware split and tokenizer hard guard; Markdown structure-aware chunking with header-path metadata |
| Search | HNSW vector + BM25 keyword hybrid search with RRF fusion; metadata-first search with explicit context/chunk hydration |
| Storage | SQLite persistence, HNSW Index persistence (fast startup), connection pooling, resumable indexing |
| Collections | Collection-scoped ingest/search/rebuild via inCollection('id') |
| Performance | Rust core, 10x faster tokenization, thread control, memory optimized |
| Context | Engine-tokenizer exact context budget, adjacent chunk expansion, single source mode |
Requirements #
| Platform | Minimum Version |
|---|---|
| iOS | 13.0+ |
| Android | API 21+ (Android 5.0 Lollipop) |
| macOS | 10.15+ (Catalina) |
ONNX Runtime is bundled automatically via the
onnxruntimeplugin. No additional native setup required.
Installation #
1. Add the dependency #
dependencies:
mobile_rag_engine:
2. Download Model Files #
# Create assets folder
mkdir -p assets && cd assets
# Download all-MiniLM-L6-v2 model (INT8 quantized for ARM64, ~23MB)
curl -L -o model.onnx "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model_qint8_arm64.onnx"
curl -L -o tokenizer.json "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/tokenizer.json"
Need multilingual (Korean, CJK, etc.)? See Model Setup Guide for BGE-m3 and other model options.
Quick Index #
Features #
- Adjacent Chunk Retrieval - Fetch surrounding context.
- Index Management - Stats, persistence, and recovery.
- Markdown Chunker - Structure-aware text splitting.
- Multi-Collection - Isolate ingest/search/rebuild by category.
- Prompt Compression - Reduce token usage.
- Search by Source - Filter results by document.
- Search Strategies - Tune ranking and retrieval.
Guides #
- Quick Start - Setup in 5 minutes.
- Model Setup - Choosing and downloading models.
- Release Build - Bundle size optimization for production.
- Troubleshooting - Common fixes.
- FAQ - Frequently asked questions.
Testing #
- Unit Testing - Mocking for isolated tests.
Initialize the engine once in your main() function:
Initialization Parameters #
await MobileRag.initialize(
tokenizerAsset: 'assets/tokenizer.json',
modelAsset: 'assets/model.onnx',
deferIndexWarmup: true,
);
// Before first search:
if (!MobileRag.instance.isIndexReady) {
await MobileRag.instance.warmupFuture;
}
Then use it anywhere in your app:
class MySearchScreen extends StatelessWidget {
Future<void> _search() async {
// 2. Add Documents (auto-chunked & embedded)
await MobileRag.instance.addDocument(
'Flutter is a UI toolkit for building apps.',
);
await MobileRag.instance.addDocument(
'Dart is the language used to build Flutter apps.',
);
// File/UTF-8 fast paths are useful for large local documents.
await MobileRag.instance.addDocumentFromFile('/path/to/manual.pdf');
final noteBytes = await File('/path/to/notes.md').readAsBytes();
await MobileRag.instance.addDocumentUtf8(
noteBytes,
name: 'notes.md',
);
// File picker fallback: prefer the Rust-side file path fast path, but
// fall back when the selected document is not exposed as a stable local path.
try {
await MobileRag.instance.addDocumentFromFile(path, name: fileName);
} on RagError {
final bytes = await File(path).readAsBytes();
final lower = fileName.toLowerCase();
if (lower.endsWith('.txt') ||
lower.endsWith('.md') ||
lower.endsWith('.markdown')) {
await MobileRag.instance.addDocumentUtf8(bytes, name: fileName);
} else {
final text = await DocumentParser.parse(bytes);
await MobileRag.instance.addDocument(text, name: fileName);
}
}
// Indexing is automatic! (Debounced 500ms)
// Optional: await MobileRag.instance.rebuildIndex(); // Call if you want it done NOW
// 3. Search with LLM-ready context
final result = await MobileRag.instance.search(
'What is Flutter?',
tokenBudget: 2000,
);
print(result.context.text); // Ready to send to LLM
}
}
Metadata-First Search #
Use searchMeta when you want lightweight search metadata first, then explicitly assemble context or hydrate only the chunks you need.
final meta = await MobileRag.instance.searchMeta(
'What is Flutter?',
topK: 10,
);
try {
final context = await MobileRag.instance.assembleContext(
searchHandle: meta.handle,
tokenBudget: 2000,
);
final chunkIds = meta.hits.map((hit) => hit.chunkId.toInt()).toList();
final chunks = await MobileRag.instance.hydrateChunks(
searchHandle: meta.handle,
chunkIds: chunkIds,
);
final excerpts = await MobileRag.instance.getChunkExcerpts(
searchHandle: meta.handle,
chunkIds: chunkIds,
maxBytes: 256,
);
print(context.text);
print('hydrated=${chunks.length}, excerpts=${excerpts.length}');
} finally {
await meta.handle.dispose();
}
Multi-Collection (v1) #
Use collection scopes when you want independent rebuild boundaries per category.
final business = MobileRag.instance.inCollection('business');
final travel = MobileRag.instance.inCollection('travel');
await business.addDocument('Quarterly planning memo...');
await travel.addDocument('Kyoto itinerary...');
if (!travel.isIndexReady) {
await travel.warmupFuture;
}
final travelHits = await travel.searchHybrid('itinerary');
print(travelHits.length);
If you do not specify a collection, the engine uses the default __default__
collection for backward compatibility.
Advanced Usage: For fine-grained control, use the high-level metadata lane (
searchMeta,assembleContext,hydrateChunks,getChunkExcerpts) or service APIs such asEmbeddingServiceandSourceRagServicedirectly. See the API Reference.
Sample App #
Check out the example application using this package. This desktop app demonstrates full RAG pipeline integration with an LLM (Gemma 2B) running locally on-device.
Contributing #
Bug reports, feature requests, and PRs are all welcome!
License #
This project is licensed under the MIT License.