string/keyphrase_utils library

TF-IDF keyphrase extraction over a small corpus — roadmap #432.

Classes

KeyphraseOptions
Options controlling keyphrase extraction; grouped to keep the public API under the 3-parameter limit and allow future knobs without breaking callers.

Functions

computeIdf(List<List<String>> corpus) Map<String, double>
Computes inverse document frequency for every term across corpus.
extractKeyphrases(String doc, List<List<String>> corpus, [KeyphraseOptions options = const KeyphraseOptions()]) List<Keyphrase>
Extracts the top-K tf*idf keyphrases from doc against corpus.
termFrequencies(List<String> tokens) Map<String, int>
Counts how many times each token appears in tokens (term frequency).
tokenizeKeyphrases(String text) List<String>
Splits text into lowercase alphanumeric tokens, dropping stopwords.

Typedefs

Keyphrase = ({String phrase, double score})
A scored keyphrase: the phrase text and its tf*idf score.