mobile_rag_engine 0.18.6 copy "mobile_rag_engine: ^0.18.6" to clipboard
mobile_rag_engine: ^0.18.6 copied to clipboard

A high-performance, on-device RAG (Retrieval-Augmented Generation) engine for Flutter. Run semantic search completely offline on iOS, Android, and macOS with HNSW vector indexing.

Mobile RAG Engine #

pub package flutter rust platform License: MIT

Production-ready, fully local RAG (Retrieval-Augmented Generation) engine for Flutter.

Powered by a Rust core, it runs vector search and embedding generation directly on the device. No servers, no API costs, no network round-trips.


Why this package? #

No Rust Installation Required #

You do NOT need to install Rust, Cargo, or Android NDK.

This package includes pre-compiled binaries for iOS, Android, and macOS. Just pub add and run.

Performance #

Feature Pure Dart Mobile RAG Engine (Rust)
Tokenization Slow HuggingFace tokenizers (Rust)
Vector Search O(n) HNSW Index — sub-linear retrieval
Memory Usage High Copy-minimized Rust core, Float32List zero-copy transport

Numbers vary by device and corpus. See benchmark_service and the 0.18.0 retrieval-hot-path notes in CHANGELOG.md for measured deltas on your own hardware.

100% Offline & Private #

Data never leaves the user's device. Perfect for privacy-focused apps (journals, secure chats, enterprise tools).


Features #

End-to-End RAG Pipeline #

End-to-End RAG Pipeline

One package, complete pipeline. From any document format to LLM-ready context.

Key Features #

Category Features
Document Input Text-layer PDF, Markdown, Plain Text, and beta DOCX support; file-path and UTF-8 ingest fast paths
Chunking Plain-text paragraph/line chunking with heading-aware split and tokenizer hard guard; Markdown structure-aware chunking with header-path metadata
Search HNSW vector + BM25 keyword hybrid search with RRF fusion; metadata-first search with explicit context/chunk hydration
Storage SQLite persistence, HNSW Index persistence (fast startup), connection pooling, resumable indexing
Collections Collection-scoped ingest/search/rebuild via inCollection('id')
Performance Rust core, 10x faster tokenization, thread control, memory optimized
Context Engine-tokenizer exact context budget, adjacent chunk expansion, single source mode

Support boundaries: text-layer PDFs are production-ready. Scanned or image-only PDFs should be routed through an OCR layer before indexing. DOCX extraction is available for early adopters, but complex DOCX layouts such as tables, headers, and footnotes should be treated as beta.


Requirements #

Platform Minimum Version
iOS 13.0+
Android API 21+ (Android 5.0 Lollipop)
macOS 10.15+ (Catalina)

ONNX Runtime is bundled automatically via the onnxruntime plugin. No additional native setup required.


Installation #

1. Add the dependency #

dependencies:
  mobile_rag_engine:

2. Download Model Files #

# Create assets folder
mkdir -p assets && cd assets

# Download all-MiniLM-L6-v2 model (INT8 quantized for ARM64, ~23MB)
curl -L -o model.onnx "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model_qint8_arm64.onnx"
curl -L -o tokenizer.json "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/tokenizer.json"

Need multilingual (Korean, CJK, etc.)? See Model Setup Guide for BGE-m3 and other model options.


Quick Index #

Features #

Guides #

Testing #


Usage #

Minimal Setup #

Initialize the engine once in your main() function. See the Quick Start Guide for the full parameter table.

await MobileRag.initialize(
  tokenizerAsset: 'assets/tokenizer.json',
  modelAsset: 'assets/model.onnx',
  deferIndexWarmup: true,
);

// Before first search, wait for BM25/HNSW warmup if you deferred it:
if (!MobileRag.instance.isIndexReady) {
  await MobileRag.instance.warmupFuture;
}

Adding Documents and Searching #

class MySearchScreen extends StatelessWidget {
  Future<void> _search() async {
    // 1. Add Documents (auto-chunked & embedded). Indexing is auto-managed
    //    (debounced ~500ms) — only call rebuildIndex() if you need it now.
    await MobileRag.instance.addDocument(
      'Flutter is a UI toolkit for building apps.',
    );

    // File / UTF-8 fast paths are useful for large local documents.
    await MobileRag.instance.addDocumentFromFile('/path/to/manual.pdf');
    final noteBytes = await File('/path/to/notes.md').readAsBytes();
    await MobileRag.instance.addDocumentUtf8(noteBytes, name: 'notes.md');

    // 2. Search with LLM-ready context
    final result = await MobileRag.instance.search(
      'What is Flutter?',
      tokenBudget: 2000,
    );
    print(result.context.text); // Ready to send to LLM
  }
}

Handling File Picker Fallback #

addDocumentFromFile is the fastest path because the Rust core reads and chunks the file directly. Some platform pickers (cloud-backed pickers, content URIs without a stable local path, etc.) return data that is not exposed as a real filesystem path. In those cases, fall back to UTF-8 or parsed-text ingestion:

try {
  await MobileRag.instance.addDocumentFromFile(path, name: fileName);
} on RagError {
  final bytes = await File(path).readAsBytes();
  final lower = fileName.toLowerCase();
  if (lower.endsWith('.txt') ||
      lower.endsWith('.md') ||
      lower.endsWith('.markdown')) {
    await MobileRag.instance.addDocumentUtf8(bytes, name: fileName);
  } else {
    try {
      final text = await DocumentParser.parse(bytes);
      await MobileRag.instance.addDocument(text, name: fileName);
    } catch (error) {
      if (DocumentParser.isOcrRequiredPdfExtractionError(error)) {
        throw UnsupportedError(
          DocumentParser.userMessageForExtractionError(error),
        );
      }
      rethrow;
    }
  }
}

Use searchMeta when you want lightweight search metadata first, then explicitly assemble context or hydrate only the chunks you need.

final meta = await MobileRag.instance.searchMeta(
  'What is Flutter?',
  topK: 10,
);

try {
  final context = await MobileRag.instance.assembleContext(
    searchHandle: meta.handle,
    tokenBudget: 2000,
  );

  final chunkIds = meta.hits.map((hit) => hit.chunkId.toInt()).toList();
  final chunks = await MobileRag.instance.hydrateChunks(
    searchHandle: meta.handle,
    chunkIds: chunkIds,
  );
  final excerpts = await MobileRag.instance.getChunkExcerpts(
    searchHandle: meta.handle,
    chunkIds: chunkIds,
    maxBytes: 256,
  );

  print(context.text);
  print('hydrated=${chunks.length}, excerpts=${excerpts.length}');
} finally {
  await meta.handle.dispose();
}

Multi-Collection (v1) #

Use collection scopes when you want independent rebuild boundaries per category.

final business = MobileRag.instance.inCollection('business');
final travel = MobileRag.instance.inCollection('travel');

await business.addDocument('Quarterly planning memo...');
await travel.addDocument('Kyoto itinerary...');

if (!travel.isIndexReady) {
  await travel.warmupFuture;
}
final travelHits = await travel.searchHybrid('itinerary');
print(travelHits.length);

If you do not specify a collection, the engine uses the default __default__ collection for backward compatibility.

Advanced Usage: For fine-grained control, use the high-level metadata lane (searchMeta, assembleContext, hydrateChunks, getChunkExcerpts) and the public API reference. Most apps should stay on the MobileRag facade.


Sample App #

Check out the example application using this package. This desktop app demonstrates full RAG pipeline integration with an LLM (Gemma 2B) running locally on-device.

mobile-ondevice-rag-desktop

Sample App Screenshot


Contributing #

Bug reports, feature requests, and PRs are all welcome!

License #

This project is licensed under the MIT License.

12
likes
160
points
892
downloads

Documentation

API reference

Publisher

verified publisherglasses-dev.win

Weekly Downloads

A high-performance, on-device RAG (Retrieval-Augmented Generation) engine for Flutter. Run semantic search completely offline on iOS, Android, and macOS with HNSW vector indexing.

Repository (GitHub)
View/report issues

Topics

#llm #machine-learning #semantic-search #vector-database #rag

License

MIT (license)

Dependencies

flutter, flutter_rust_bridge, freezed_annotation, onnxruntime, path_provider, rag_engine_flutter

More

Packages that depend on mobile_rag_engine