model2vec
High-performance, local text embeddings for Dart and Flutter. A Dart wrapper around model2vec-rs using Rust FFI and Native Assets. Model2Vec creates small, fast, and effective text embeddings by distilling knowledge from large language models into a simple vocabulary-based look-up table.
Table of Contents
- model2vec
Key Features
- Extreme Performance: Built on top of a highly optimized Rust engine. Up to ~1.7x faster than the official Python implementation, generating embeddings in microseconds.
- Compact & Quantized: Models are typically 25MB - 100MB. Perfect for edge computing.
- Massive Streaming: Built-in
generateEmbeddingStreamfor processing millions of rows without blocking the Event Loop or overflowing RAM. - Hugging Face Integration: Automatically downloads and caches models directly from the Hugging Face Hub.
- Zero-Stutter Async: Transparently runs heavy tokenization and math in background Dart Isolates using
Asyncmethods. - Vector Utilities: Ships with high-performance mathematical tools (
cosineSimilarity,quantizeToInt8,similaritySearch, etc.).
Recommended Models
Model2Vec provides a variety of pre-trained models optimized for different use cases. These can be loaded directly via their Hugging Face model ID.
| Model ID | Language | Distilled From | Params | Dimension | Size |
|---|---|---|---|---|---|
minishlab/potion-base-32M |
English | bge-base-en-v1.5 | 32.3M | 512 | ~150MB |
minishlab/potion-multilingual-128M |
Multi | bge-m3 | 128M | 768 | ~500MB |
minishlab/potion-retrieval-32M |
English | bge-base-en-v1.5 | 32.3M | 512 | ~150MB |
minishlab/potion-code-16M |
Code | CodeRankEmbed | 16M | 384 | ~80MB |
minishlab/potion-base-8M |
English | bge-base-en-v1.5 | 7.5M | 256 | ~50MB |
minishlab/potion-base-4M |
English | bge-base-en-v1.5 | 3.7M | 128 | ~30MB |
minishlab/potion-base-2M |
English | bge-base-en-v1.5 | 1.8M | 64 | ~25MB |
Installation
Add model2vec to your pubspec.yaml:
dependencies:
model2vec: any
Or add it using the command line:
dart pub add model2vec
Requires Dart SDK: 3.10.0+ and Rust toolchain: 1.86.0+ (to build the native library via Native Assets).
Quick Start
import 'package:model2vec/model2vec.dart';
void main() {
final m2v = Model2Vec.instance;
// Initialize with a model from Hugging Face
m2v.initEmbedder('minishlab/potion-base-2M');
// Generate an embedding
final embedding = m2v.generateEmbedding('Dart FFI is blazingly fast 🚀');
print('Vector dimension: ${m2v.embeddingDimension}');
print('Vocabulary size: ${m2v.vocabularySize}');
}
Recipes & Patterns
1. Advanced Batch Processing
Process multiple strings at once for maximum hardware utilization. You can control sequence truncation and batch sizes.
final texts = ['Dart', 'Rust', 'Flutter'];
final embeddings = m2v.generateBatchEmbeddings(
texts,
maxLength: 256, // Truncate strings longer than 256 tokens
batchSize: 1024, // Internal chunks sent to the FFI layer
);
2. Massive Data Streaming
When reading gigabytes of text from files or databases, loading everything into memory will crash the app. Use the Streaming API to handle data in chunks automatically.
import 'dart:convert';
import 'dart:io';
Future<void> processHugeFile() async {
final fileStream = File('massive_dataset.txt')
.openRead()
.transform(utf8.decoder)
.transform(const LineSplitter());
// Converts a Stream<String> into a Stream<Float32List>
final embeddingStream = m2v.generateEmbeddingStream(
fileStream,
batchSize: 500, // Process 500 strings at a time
useIsolate: true, // Run math in background threads
);
await for (final embedding in embeddingStream) {
saveToDb(embedding); // Memory safe!
}
}
3. Asynchronous Isolate Execution
Never block the main thread. If you are building a Flutter app, always use the Async variants to perform generation in a background Isolate.
final embedding = await m2v.generateEmbeddingAsync('A very long text...');
final batch = await m2v.generateBatchEmbeddingsAsync(['A', 'B', 'C']);
4. Vector Math & Quantization
The library ships with Model2VecUtils — a powerful suite of math operations tuned for embeddings.
final query = m2v.generateEmbedding('cat');
final candidates = [
m2v.generateEmbedding('dog'),
m2v.generateEmbedding('space'),
];
// 1. Semantic Similarity (Cosine)
final sim = Model2VecUtils.cosineSimilarity(query, candidates[0]);
// 2. Threshold Searching (Find all matches > 80%)
final matches = Model2VecUtils.similaritySearchWithThreshold(
query, candidates, threshold: 0.8,
);
// 3. Scalar Quantization (Compress Float32 to Int8 to save 4x RAM)
final compressed = Model2VecUtils.quantizeToInt8(query);
// 4. Mean Pooling (Average multiple vectors into one)
final sentenceVector = Model2VecUtils.meanPooling(candidates);
// 5. DB Serialization
final base64String = Model2VecUtils.toBase64(query);
API Reference
Core Methods (Model2Vec class)
| Method / Property | Description |
|---|---|
initEmbedder(path) |
Initializes the model from a Hugging Face repo ID or local path. |
initEmbedderAdvanced(...) |
Advanced initialization with custom cacheDirectory, hfToken, or normalize overrides. |
initEmbedderFromBytes(...) |
Initializes the model directly from raw Uint8List bytes (model.safetensors, tokenizer.json, etc). |
getRecommendedModels() |
Returns a list of officially supported models. |
tokenize(text) |
Runs the internal BPE tokenizer and returns a List<String>. |
generateEmbedding(text) |
Synchronously generates a Float32List embedding vector. |
generateBatchEmbeddings(texts) |
Synchronously generates embeddings for a List<String> using Rust SIMD. |
generateEmbeddingAsync(text) |
Asynchronously generates an embedding in a background Isolate. |
generateEmbeddingStream(stream) |
Processes a huge Stream<String> into a Stream<Float32List> in batches. |
embeddingDimension |
Property returning the vector size (e.g., 256, 384, 512). |
vocabularySize |
Property returning the number of tokens in the model's vocabulary. |
Math Utilities (Model2VecUtils class)
| Method | Description |
|---|---|
cosineSimilarity(a, b) |
Calculates cosine similarity (-1.0 to 1.0) between two vectors. |
cosineDistance(a, b) |
Calculates cosine distance (0.0 to 2.0). |
euclideanDistance(a, b) |
Calculates Euclidean (L2) distance. |
similaritySearch(query, docs) |
Returns the indices of the Top-K most similar vectors in a database. |
similaritySearchWithThreshold |
Returns all indices with similarity above a given threshold. |
quantizeToInt8(vector) |
Compresses a Float32List into an Int8List (4x memory savings). |
normalize(vector) |
Applies L2 normalization to a vector. |
meanPooling(vectors) |
Averages multiple vectors into a single vector. |
toBase64 / fromBase64 |
Serializes/Deserializes a vector to/from a Base64 string for DB storage. |
Performance
model2vec uses highly optimized FFI bindings. For mathematical operations on embeddings, Dart handles single-vector math natively with zero-overhead, while batch generation leverages Rust's SIMD (auto-vectorization) capabilities.
Here is a performance benchmark run on a typical machine (AOT compiled):
| Model | Load Time (Cache) | Single Embedding | Batch (32) |
|---|---|---|---|
minishlab/potion-base-2M |
~40 ms | 372.9 μs | 3.85 ms |
minishlab/potion-base-4M |
~40 ms | 363.7 μs | 4.19 ms |
minishlab/potion-base-8M |
~40 ms | 382.1 μs | 5.60 ms |
minishlab/potion-base-32M |
~120 ms | 452.6 μs | 6.79 ms |
minishlab/potion-multilingual-128M |
~1050 ms | 416.1 μs | 5.38 ms |
Note: Initial load times may vary slightly based on the disk speed. Generating an embedding takes just a few microseconds per string.
similaritySearchover 100,000 vectors takes <100ms in pure Dart.
Development & Contributing
The library uses Dart Native Assets, meaning cargo build is invoked automatically when running Dart code.
To manually re-build bindings if you modify the Rust C-API (native/src/lib.rs):
dart run ffigen
To run the test suite:
dart test
License
This project is licensed under the MIT License - see the LICENSE file for details.
Libraries
- model2vec
- A Dart package that provides a simple interface to the Model2Vec Rust library for generating text embeddings.