String Search Algorithms

Simple Dart library for string similarity and substring search. Pure Dart, null-safe, with configurable engines and caching for repeated comparisons.

Highlights

  • Similarity metrics: Levenshtein, Jaro-Winkler, Cosine, Jaccard, and more.
  • Search algorithms: KMP, Boyer-Moore, Rabin-Karp, and a standard wrapper.
  • Instance-based engines with configurable normalization and caching.
  • Compiled patterns for efficient repeated substring searches.
  • Extension methods for ergonomic usage on String.

Installation

Add this to your pubspec.yaml:

dependencies:
  string_search_algorithms: ^1.0.1

Quick start

import 'package:string_search_algorithms/string_search_algorithms.dart';

void main() {
  final score = StringSimilarity.compare(
    'Dwayne',
    'Duane',
    algorithm: SimilarityAlgorithm.jaroWinkler,
  );
  print('Similarity: $score');

  final index = StringSearch.indexOf(
    'The quick brown fox jumps over the lazy dog',
    'brown',
    algorithm: SearchAlgorithm.boyerMoore,
  );
  print('Index: $index');
}

Similarity

Static helpers

final score = StringSimilarity.compare(
  'kitten',
  'sitting',
  algorithm: SimilarityAlgorithm.levenshtein,
);

final details = StringSimilarity.compareWithDetails(
  'Dwayne',
  'Duane',
  algorithm: SimilarityAlgorithm.jaroWinkler,
);

Engine with options

Use StringSimilarityEngine for per-instance configuration and caching.

final engine = StringSimilarityEngine(
  options: const SimilarityOptions(
    normalization: NormalizationOptions(
      toLowerCase: true,
      removeAccents: true,
      removeSpecialChars: true,
      trimWhitespace: true,
    ),
    cache: CacheOptions(
      enabled: true,
      normalizedCapacity: 1000,
      bigramCapacity: 1000,
      ngramCapacity: 1000,
    ),
    algorithms: AlgorithmOptions(
      ngramSize: 3,
      tverskyAlpha: 0.5,
      tverskyBeta: 0.5,
      jaroWinklerPrefixScale: 0.1,
      jaroWinklerBoostThreshold: 0.7,
    ),
  ),
);

final score = engine.compare(
  'Cafe!',
  'cafe',
  algorithm: SimilarityAlgorithm.levenshtein,
);

Fuzzy matching

final candidates = ['apple', 'banana', 'orange', 'grape'];
final matches = StringSimilarity.findMatches(
  'appel',
  candidates,
  minScore: 0.5,
);

for (final match in matches) {
  print('${match.value}: ${match.score}');
}
final text = 'The quick brown fox jumps over the lazy dog';
final index = StringSearch.indexOf(
  text,
  'brown',
  algorithm: SearchAlgorithm.boyerMoore,
);

Compiled patterns

final pattern = StringSearch.compile(
  'fox',
  algorithm: SearchAlgorithm.kmp,
);

if (pattern.containsIn(text)) {
  print('Found!');
}

for (final match in pattern.findAllIn(text)) {
  print('Found at ${match.index}');
}

Configuration

  • NormalizationOptions controls trimming, case folding, accent removal, and custom preprocessors/postprocessors.
  • CacheOptions sizes the normalized string, bigram, and n-gram caches.
  • AlgorithmOptions tunes Jaro-Winkler, N-gram size, and Tversky parameters.

See the API docs for full option details.

Benchmarks

Benchmark scripts live in benchmark/:

dart run benchmark/similarity_benchmark.dart
dart run benchmark/search_benchmark.dart

API reference

API docs will be available on pub.dev: https://pub.dev/documentation/string_search_algorithms/latest/

Contributing

Contributions are welcome. Please read CONTRIBUTING.md and open a PR.

License

Licensed under the MIT License. See LICENSE.

Libraries

Substring search APIs and algorithms.
similarity
String similarity and fuzzy matching APIs.
string_search_algorithms
High-performance string similarity and substring search algorithms for Dart.