flutter_hybrid_search
An offline hybrid search engine for Flutter that combines vector similarity, FTS5 full-text search, typo-tolerant keyword matching, and heuristic reranking — entirely on-device, no cloud, no latency.
Features
- Vector similarity search — cosine distance on precomputed Float32 embeddings
- HNSW approximate index — sub-millisecond vector search for corpora >= 1 000 entries
- FTS5 full-text search — exact keyword matching via SQLite FTS5
- Typo-tolerant matching — 1-character edit distance (substitution, insertion, deletion)
- Heuristic reranking — FTS boost + typo boost + concise-question boost + deduplication
- Pluggable embedder — implement
Embedderwith any model (BERT, TF-IDF, ...) - Pluggable reranker — implement
RerankerInterfacefor custom ranking logic - Float16 binary format — compact precomputed embeddings (50 % smaller than Float32)
- Configurable schema — custom table/column names for any SQLite database
- Zero cloud dependency — works fully offline on Android, iOS, macOS, Linux, Windows
- Structured logging — diagnostics via
package:logging(zero overhead when unused) - Search metadata — per-phase timing and candidate counts for performance profiling
- Batch search — search multiple queries in one call
- Embed cache — LRU cache for repeated/autocomplete queries
- Dimension validation — early error on embedding/config mismatch
Getting started
dependencies:
flutter_hybrid_search: ^1.1.0
sqflite: ^2.4.2
Architecture
User query
|
v
+-----------+ Float32 vector
| Embedder |----------------------+
+-----------+ v
+--------------+
| Vector scorer| cosine / HNSW
+------+-------+
| top-N candidates
+-----------+ v
| SQLite DB |-> FTS5 MATCH --> +--------------+
| FTS5 |-> Typo scan --> | Candidate | union
+-----------+ | pool |
+------+-------+
|
+------v-------+
| Reranker | boosts + dedup
+------+-------+
|
+------v-------+
| Keyword filter| overlap check
+------+-------+
|
List<SearchResult>
Usage
1. Implement Embedder
import 'package:flutter_hybrid_search/flutter_hybrid_search.dart';
class MyEmbedder implements Embedder {
@override
Future<Embedding> embed(String text) async {
// Run your model here (ONNX, TFLite, etc.).
return myModel.encode(text); // must return Embedding (Float32List)
}
@override
List<String> contentWords(String text) {
// Return meaningful tokens, stopwords removed.
return text.toLowerCase()
.split(RegExp(r'\s+'))
.where((w) => w.isNotEmpty && !_stopwords.contains(w))
.toList();
}
}
2. Load assets and build the engine
// Load Float16 embeddings from a binary asset.
final bytes = (await rootBundle.load('assets/embeddings.bin')).buffer.asUint8List();
final embeddings = Float16Store.decode(bytes);
// Open the SQLite database.
final db = await openDatabase('kb.db', readOnly: true);
// Create and initialise the engine.
final engine = HybridSearchEngine(
db: db,
embeddings: embeddings,
embedder: MyEmbedder(),
);
await engine.initialize();
3. Search
final results = await engine.search('What is Flutter?', limit: 3);
for (final r in results) {
print('${r.score.toStringAsFixed(3)} ${r.entry.question}');
print(r.entry.answer);
}
4. Search with metadata (profiling)
final (:results, :metadata) = await engine.searchWithMetadata('flutter');
print('Total: ${metadata.totalMs.toStringAsFixed(1)} ms');
print('Embed: ${metadata.embedMs.toStringAsFixed(1)} ms');
print('Vector: ${metadata.vectorMs.toStringAsFixed(1)} ms');
print('Candidates: ${metadata.candidateCount}');
5. Batch search
final batch = await engine.searchBatch(
['dart', 'flutter', 'widgets'],
limit: 3,
);
for (final results in batch) {
print(results.first.entry.question);
}
6. Custom configuration
final engine = HybridSearchEngine(
db: db,
embeddings: embeddings,
embedder: MyEmbedder(),
config: const HybridSearchConfig(
candidatePoolSize: 80, // more candidates -> better recall
hnswThreshold: 500, // enable HNSW at 500+ entries
embeddingDim: 256, // match your model's output size
tableName: 'articles', // custom DB schema
questionColumn: 'title',
answerColumn: 'body',
),
reranker: const HeuristicReranker(), // or your own RerankerInterface
embedCacheSize: 64, // LRU cache for embed() results
);
7. Custom reranker
class CategoryBoostReranker implements RerankerInterface {
const CategoryBoostReranker(this.preferred);
final String preferred;
@override
List<SearchResult> rerank(
String query,
RerankerCandidates candidates,
Set<int> keywordMatchIds, {
int limit = 3,
Embedding? queryEmbedding,
Set<int>? ftsIds,
List<String>? contentWords,
}) {
final sorted = candidates.toList()
..sort((a, b) {
final sa = a.vectorScore + (a.entry.category == preferred ? 0.3 : 0.0);
final sb = b.vectorScore + (b.entry.category == preferred ? 0.3 : 0.0);
return sb.compareTo(sa);
});
return sorted.take(limit).map((c) => SearchResult(
entry: c.entry, score: c.vectorScore, method: 'category_boost',
)).toList();
}
}
8. Logging
import 'package:logging/logging.dart';
Logger.root.level = Level.FINE;
Logger.root.onRecord.listen((record) {
print('${record.level.name}: ${record.loggerName}: ${record.message}');
});
// Now all engine operations log timing and diagnostics:
// FINE: HybridSearchEngine: Initialized in 42 ms: 500 entries, dim=128, HNSW=false.
// FINE: HybridSearchEngine: Search "flutter": 3 results in 12.4 ms (embed=5.1, vec=0.3, fts=2.1, typo=1.8, rerank=3.1).
Database schema
The default schema expected by the engine:
CREATE TABLE entries (
id INTEGER PRIMARY KEY, -- 1-based, matches embedding index
category TEXT NOT NULL,
question TEXT NOT NULL,
answer TEXT NOT NULL
);
-- FTS5 virtual table for full-text search over 'question'.
CREATE VIRTUAL TABLE fts USING fts5(
question,
content=entries,
content_rowid=id
);
All column and table names are overridable via HybridSearchConfig.
Embedding binary format
The Float16Store.decode method reads this layout (written by a Python
training script or any compatible tool):
Offset Size Field
0 4 bytes count (uint32, little-endian)
4 4 bytes dimension (uint32, little-endian)
8+ count x dim x 2 B Float16 vectors (IEEE 754, little-endian)
Python writer:
import struct, numpy as np
vectors = embeddings.astype(np.float16) # shape (N, D)
with open('embeddings.bin', 'wb') as f:
f.write(struct.pack('<II', N, D))
vectors.tofile(f)
API reference
HybridSearchEngine
| Member | Description |
|---|---|
HybridSearchEngine({db, embeddings, embedder, config?, reranker?, embedCacheSize?}) |
Constructor (validates embedding dimensions) |
initialize() |
Builds HNSW index + loads question map. Call once before search. Concurrent-safe. |
search(query, {limit}) -> Future<List<SearchResult>> |
Main search method |
searchWithMetadata(query, {limit}) -> Future<({results, metadata})> |
Search with timing diagnostics |
searchBatch(queries, {limit}) -> Future<List<List<SearchResult>>> |
Batch search for multiple queries |
dispose() |
Closes the database connection |
isInitialized |
Whether the engine is ready for queries |
entryCount |
Number of embeddings (available before initialize()) |
Embedder
| Member | Description |
|---|---|
embed(text) -> Future<Embedding> |
Convert text to dense vector |
contentWords(text) -> List<String> |
Stopword-stripped tokens for keyword matching |
SearchMetadata
| Field | Type | Description |
|---|---|---|
embedMs |
double |
Time spent generating query embedding |
vectorMs |
double |
Time spent on vector scoring |
ftsMs |
double |
Time spent on FTS5 search |
typoMs |
double |
Time spent on typo-tolerant scan |
rerankMs |
double |
Time spent in reranker |
totalMs |
double |
Total wall-clock time |
candidateCount |
int |
Total candidates in pool |
vectorCandidateCount |
int |
Candidates from vector scoring |
keywordCandidateCount |
int |
Candidates from keyword matching |
HybridSearchConfig
| Parameter | Default | Description |
|---|---|---|
candidatePoolSize |
50 |
Max candidates fed to reranker |
ftsLimit |
50 |
Max FTS5 results per query |
hnswThreshold |
1000 |
Min entries to enable HNSW index |
hnswSearchK |
100 |
Neighbours to fetch from HNSW (must be >= candidatePoolSize) |
hnswM |
16 |
HNSW graph connections per node |
hnswEf |
64 |
HNSW search candidate list size |
embeddingDim |
128 |
Vector dimension (must match your model) |
tableName |
'entries' |
SQLite table name |
ftsTableName |
'fts' |
FTS5 virtual table name |
idColumn |
'id' |
Primary key column |
categoryColumn |
'category' |
Category column |
questionColumn |
'question' |
Question / title column |
answerColumn |
'answer' |
Answer / body column |
SearchRanking — boost constants
| Constant | Value | Trigger |
|---|---|---|
ftsBoost |
0.5 |
Entry found by FTS5 MATCH |
typoBoost |
0.7 |
Entry found by typo tolerance only |
conciseMatchBoost |
0.5 (max) |
Short, on-topic question |
perfectScoreThreshold |
0.999 |
Returns single result shortcut |
Float16Store
| Method | Description |
|---|---|
Float16Store.decode(bytes) -> List<Embedding> |
Decode full embedding file |
Float16Store.peekCount(bytes) -> int |
Read vector count from header |
Float16Store.peekDimension(bytes) -> int |
Read dimension from header |
Search pipeline in detail
- Embed query —
Embedder.embed(query)->Embedding(cached via LRU) - Vector scoring — cosine similarity against all precomputed embeddings
- Small corpus (<
hnswThreshold): O(n) linear scan - Large corpus (>=
hnswThreshold): HNSW approximate search (sub-ms)
- Small corpus (<
- FTS5 search —
MATCHquery on the question column; retries with single word on no results - Typo-tolerant scan — Levenshtein-1 match on all question texts
- Candidate pool — union of top-N by vector score + all keyword matches
- Rerank — apply boost signals, sort, deduplicate by question text
- Keyword-overlap filter — discard results with zero word overlap to query
Pair with dart_wordpiece
For a complete offline pipeline, pair this package with
dart_wordpiece for BERT
tokenization:
class BertEmbedder implements Embedder {
BertEmbedder(this._session, this._tokenizer);
final OrtSession _session;
final WordPieceTokenizer _tokenizer;
@override
Future<Embedding> embed(String text) async {
final output = _tokenizer.encode(text);
final inputs = {
'input_ids': OrtValueTensor.createTensorWithDataList(output.inputIdsInt64, [1, 64]),
'attention_mask': OrtValueTensor.createTensorWithDataList(output.attentionMaskInt64, [1, 64]),
'token_type_ids': OrtValueTensor.createTensorWithDataList(output.tokenTypeIdsInt64, [1, 64]),
};
final outputs = _session.run(OrtRunOptions(), inputs);
final raw = outputs[0]!.value as List<List<List<double>>>;
return Embedding.fromList(_meanPool(raw[0], output.realLength));
}
@override
List<String> contentWords(String text) => _tokenizer.contentWords(text);
}
Performance
Benchmarked on a mid-range Android device with 500 entries and 128-dim BERT-Tiny:
| Step | Time |
|---|---|
| Embedding generation | 10-50 ms |
| Vector search (linear, 500 entries) | < 1 ms |
| Vector search (HNSW, 10 000 entries) | < 2 ms |
| FTS5 query | < 5 ms |
| Typo-tolerant scan | < 10 ms |
| Reranking | < 5 ms |
| Total | < 100 ms typical |
Contributing
Issues and pull requests are welcome!
- Run
flutter analyze— zero warnings required - Run
flutter test— all tests must pass - Follow the Dart style guide
License
MIT
Libraries
- flutter_hybrid_search
- Offline hybrid search engine for Flutter.