Mobile RAG Engine #

pub package

Production-ready, fully local RAG (Retrieval-Augmented Generation) engine for Flutter.

Powered by a Rust core, it delivers lightning-fast vector search and embedding generation directly on the device. No servers, no API costs, no latency.

⚡️ Why this package? #

✅ No Rust Installation Required #

You do NOT need to install Rust, Cargo, or Android NDK.

This package includes pre-compiled binaries for iOS, Android, and macOS. Just pub add and run.

(Powered by flutter_rust_bridge & cargokit)

🚀 Performance First #

Feature	Pure Dart	Mobile RAG Engine (Rust)
Tokenization	Slow	10x Faster (HuggingFace tokenizers)
Vector Search	O(n)	O(log n) (HNSW Index)
Memory Usage	High	Optimized (Zero-copy FFI)

🔒 100% Offline & Private #

Data never leaves the user's device. Perfect for privacy-focused apps (journals, secure chats, enterprise tools).

✨ Features #

Cross-Platform: Works seamlessly on iOS, Android, and macOS
HNSW Vector Index: Fast approximate nearest neighbor search (proven scale up to 10k+ docs)
Hybrid Search Ready: Supports semantic search combined with exact matching
Auto-Chunking: Intelligent text splitting strategies included (Unicode-based semantic chunking)
Model Flexibility: Use standard ONNX models (e.g., bge-m3, all-MiniLM-L6-v2)

📸 Benchmark Results #

[iOS Benchmark] [Android Benchmark]

🛠 Installation #

1. Add the dependency #

dependencies:
  mobile_rag_engine: ^0.3.8

2. Download Model Files #

# Create assets folder
mkdir -p assets && cd assets

# Download BGE-m3 model (INT8 quantized, multilingual)
curl -L -o model.onnx "https://huggingface.co/Teradata/bge-m3/resolve/main/onnx/model_int8.onnx"
curl -L -o tokenizer.json "https://huggingface.co/BAAI/bge-m3/resolve/main/tokenizer.json"

📖 See Model Setup Guide for alternative models and production deployment strategies.

⚡️ Quick Start #

Initialize the engine and start searching in just a few lines of code:

import 'package:mobile_rag_engine/mobile_rag_engine.dart';

void main() async {
  // 1. Initialize Rust library & services
  await RustLib.init(externalLibrary: ExternalLibrary.process(iKnowHowToUseIt: true));
  await initTokenizer(tokenizerPath: 'assets/tokenizer.json');
  await EmbeddingService.init(modelBytes);

  // 2. Add Documents (Auto-embedded & indexed)
  final embedding = await EmbeddingService.embed('Flutter is a UI toolkit.');
  await addDocument(
    dbPath: dbPath,
    content: 'Flutter is a UI toolkit.',
    embedding: embedding,
  );
  await rebuildHnswIndex(dbPath: dbPath);

  // 3. Search
  final queryEmbedding = await EmbeddingService.embed('What is Flutter?');
  final results = await searchSimilar(
    dbPath: dbPath,
    queryEmbedding: queryEmbedding,
    topK: 5,
  );

  print(results.first); // "Flutter is a UI toolkit."
}

📊 Benchmarks #

Rust-powered components (M3 Pro macOS):

Operation	Time	Notes
Tokenization (234 chars)	0.04ms	HuggingFace `tokenizers` crate
HNSW Search (100 docs)	0.3ms	`instant-distance` (O(log n))

These are the components where Rust provides 10-100x speedup over pure Dart implementations.

Embedding generation uses ONNX Runtime (platform-dependent, typically 25-100ms per text).

🏗 Architecture #

This package bridges the best of two worlds: Flutter for UI and Rust for heavy lifting.

[Architecture]

Component	Technology
Embedding	ONNX Runtime with quantized models (INT8)
Storage	SQLite for metadata + memory-mapped vector index
Search	`instant-distance` (HNSW) for low-latency retrieval
Tokenization	HuggingFace `tokenizers` crate

🧩 Problems Solved #

iOS Cross-Compilation

The onig regex library blocked iOS builds (___chkstk_darwin symbol missing). Switched to pure Rust fancy-regex to fix it.

HNSW Index Timing

Rebuilding HNSW on every insert → O(n²). Changed to rebuild once after bulk inserts.

Duplicate Document Handling

Added SHA256 content hashing to skip already-stored documents.

ONNX Runtime Thread Safety

OrtSession isn't thread-safe. Switched to sequential processing—still fast enough since individual embeddings are quick.

📦 Model Options #

Model	Size	Best For
Teradata/bge-m3 (INT8)	~200MB	Multilingual (Korean, English, etc.)
all-MiniLM-L6-v2	~25MB	English only, faster

Custom Models: Export any Sentence Transformer to ONNX:

pip install optimum[exporters]
optimum-cli export onnx --model sentence-transformers/YOUR_MODEL ./output

📋 Releases #

v0.3.0 - Rust Semantic Chunking - Unicode-based semantic chunking
v0.2.0 - LLM-Optimized Chunking - Chunking and context assembly

🛣 Roadmap #

✅ INT8 quantization support
✅ Chunking strategies for long documents
❌ Korean-specific models (KoSimCSE, KR-SBERT)
❌ Hybrid search (keyword + semantic)
❌ iOS/Android On-Demand Resources

🤝 Contributing #

Bug reports, feature requests, and PRs are all welcome!

📄 License #

This project is licensed under the MIT License.

mobile_rag_engine 0.3.8
mobile_rag_engine: ^0.3.8 copied to clipboard

Metadata

Mobile RAG Engine #

⚡️ Why this package? #

✅ No Rust Installation Required #

🚀 Performance First #

🔒 100% Offline & Private #

✨ Features #

📸 Benchmark Results #

🛠 Installation #

1. Add the dependency #

2. Download Model Files #

⚡️ Quick Start #

📊 Benchmarks #

🏗 Architecture #

🧩 Problems Solved #

📦 Model Options #

📋 Releases #

🛣 Roadmap #

🤝 Contributing #

📄 License #

← Metadata

Publisher

Weekly Downloads

Metadata

Topics

License

Dependencies

More

mobile_rag_engine 0.3.8 mobile_rag_engine: ^0.3.8 copied to clipboard

Metadata

Mobile RAG Engine #

⚡️ Why this package? #

✅ No Rust Installation Required #

🚀 Performance First #

🔒 100% Offline & Private #

✨ Features #

📸 Benchmark Results #

🛠 Installation #

1. Add the dependency #

2. Download Model Files #

⚡️ Quick Start #

📊 Benchmarks #

🏗 Architecture #

🧩 Problems Solved #

📦 Model Options #

📋 Releases #

🛣 Roadmap #

🤝 Contributing #

📄 License #

← Metadata

Publisher

Weekly Downloads

Metadata

Topics

License

Dependencies

More

mobile_rag_engine 0.3.8
mobile_rag_engine: ^0.3.8 copied to clipboard