String Search Algorithms
Simple Dart library for string similarity and substring search. Pure Dart, null-safe, with configurable engines and caching for repeated comparisons.
Highlights
- Similarity metrics: Levenshtein, Jaro-Winkler, Cosine, Jaccard, and more.
- Composite "smart" matching: a calibrated ensemble that stays robust across typos, word reordering, and containment - and is safe to rank with.
- Search algorithms: KMP, Boyer-Moore, Rabin-Karp, and a standard wrapper.
- Instance-based engines with configurable normalization and caching.
- Compiled patterns for efficient repeated substring searches.
- Extension methods for ergonomic usage on String.
Installation
Add this to your pubspec.yaml:
dependencies:
string_search_algorithms: ^1.0.1
Quick start
import 'package:string_search_algorithms/string_search_algorithms.dart';
void main() {
final score = StringSimilarity.compare(
'Dwayne',
'Duane',
algorithm: SimilarityAlgorithm.jaroWinkler,
);
print('Similarity: $score');
final index = StringSearch.indexOf(
'The quick brown fox jumps over the lazy dog',
'brown',
algorithm: SearchAlgorithm.boyerMoore,
);
print('Index: $index');
}
Similarity
Static helpers
final score = StringSimilarity.compare(
'kitten',
'sitting',
algorithm: SimilarityAlgorithm.levenshtein,
);
final details = StringSimilarity.compareWithDetails(
'Dwayne',
'Duane',
algorithm: SimilarityAlgorithm.jaroWinkler,
);
Engine with options
Use StringSimilarityEngine for per-instance configuration and caching.
final engine = StringSimilarityEngine(
options: const SimilarityOptions(
normalization: NormalizationOptions(
toLowerCase: true,
removeAccents: true,
removeSpecialChars: true,
trimWhitespace: true,
),
cache: CacheOptions(
enabled: true,
normalizedCapacity: 1000,
bigramCapacity: 1000,
ngramCapacity: 1000,
),
algorithms: AlgorithmOptions(
ngramSize: 3,
tverskyAlpha: 0.5,
tverskyBeta: 0.5,
jaroWinklerPrefixScale: 0.1,
jaroWinklerBoostThreshold: 0.7,
),
),
);
final score = engine.compare(
'Cafe!',
'cafe',
algorithm: SimilarityAlgorithm.levenshtein,
);
Fuzzy matching
final candidates = ['apple', 'banana', 'orange', 'grape'];
final matches = StringSimilarity.findMatches(
'appel',
candidates,
minScore: 0.5,
);
for (final match in matches) {
print('${match.value}: ${match.score}');
}
Composite (smart) similarity
When you do not want to pick a single algorithm, use
SimilarityAlgorithm.composite. It combines complementary witnesses
(Jaro-Winkler, Dice, Cosine, Overlap) into one stable score that handles typos,
word reordering, and containment at once. Unlike picking an algorithm per input,
it produces comparable scores, so it is safe to use with findMatches and
rankByRelevance.
StringSimilarity.compare(
'John Smith',
'Smith John',
algorithm: SimilarityAlgorithm.composite,
); // high - word reordering handled
Token-based witnesses respect normalization, so enable
NormalizationOptions(removeSpecialChars: true) when inputs carry punctuation
(for example 'Smith, John').
Each witness is calibrated against an empirical noise floor before combining, so
unrelated inputs collapse toward 0. The defaults can be tuned via
CompositeOptions, and compareWithDetails explains the result:
final result = StringSimilarity.compareWithDetails(
'John Smith',
'Smith John',
algorithm: SimilarityAlgorithm.composite,
);
print(result.metadata['witnesses']); // per-witness sub-scores
print(result.metadata['dominant']); // which witness drove the score
Substring search
Basic search
final text = 'The quick brown fox jumps over the lazy dog';
final index = StringSearch.indexOf(
text,
'brown',
algorithm: SearchAlgorithm.boyerMoore,
);
Compiled patterns
final pattern = StringSearch.compile(
'fox',
algorithm: SearchAlgorithm.kmp,
);
if (pattern.containsIn(text)) {
print('Found!');
}
for (final match in pattern.findAllIn(text)) {
print('Found at ${match.index}');
}
Configuration
NormalizationOptionscontrols trimming, case folding, accent removal, and custom preprocessors/postprocessors.CacheOptionssizes the normalized string, bigram, and n-gram caches.AlgorithmOptionstunes Jaro-Winkler, N-gram size, and Tversky parameters.CompositeOptionstunes the composite combiner, witness weights, and the per-witness calibration floors.
See the API docs for full option details.
Benchmarks
Benchmark scripts live in benchmark/:
dart run benchmark/similarity_benchmark.dart
dart run benchmark/search_benchmark.dart
dart run benchmark/composite_calibration.dart # re-derive composite floors
API reference
API docs will be available on pub.dev: https://pub.dev/documentation/string_search_algorithms/latest/
Contributing
Contributions are welcome. Please read CONTRIBUTING.md and open a PR.
License
Licensed under the MIT License. See LICENSE.
Libraries
- search
- Substring search APIs and algorithms.
- similarity
- String similarity and fuzzy matching APIs.
- string_search_algorithms
- High-performance string similarity and substring search algorithms for Dart.