aho_corasick 1.1.0 copy "aho_corasick: ^1.1.0" to clipboard
aho_corasick: ^1.1.0 copied to clipboard

outdated

Pattern Matching implementation of aho_corasick using a state machine to find all matches in a text for a large number of words.

Implementation of the Aho-Corasick algorithm for finding words in a text. The Aho-Corasick algorithm is especially fast if you have a large list of words that might have some matches in your text.

Usage #

You can either search for all or for the first match with the optional flag to search for the first, but longest match.

import 'package:aho_corasick/aho_corasick.dart';

main() {
  final aho = AhoCorasick.fromWordList(['abc', 'bcd', 'bcde']);
  final results = aho.matches('search in abcd');
  print(results
      .map((match) => 'found ${match.word} at ${match.startIndex}')
      .join('\n'));
  // > found abc at 10
  // > found bcd at 11

  final longest = aho.firstMatch('search bcde', longest: true);
  print(longest.word); // > bcde
}

Some Technical Details #

It creates a state machine for all the words with a failure mechanism, if a word does not match it can easily find other words whose prefix was just read by the algorithm. Therefore it has a initialization that is proportional to the number of words and the characters used. The Search phase is usually linear for "normal texts". Edge cases like "only one character" can lead to a quadratic time complexity, but that is only because the number of results can be quadratic in the length of the text.

More information at: https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm

Sadly the original paper is not open access. But there is ample material elsewhere.

4
likes
40
pub points
51%
popularity

Publisher

unverified uploader

Pattern Matching implementation of aho_corasick using a state machine to find all matches in a text for a large number of words.

Repository (GitHub)
View/report issues

License

BSD-3-Clause (LICENSE)

Dependencies

meta

More

Packages that depend on aho_corasick