string/keyphrase_utils library
TF-IDF keyphrase extraction over a small corpus — roadmap #432.
Classes
- KeyphraseOptions
- Options controlling keyphrase extraction; grouped to keep the public API under the 3-parameter limit and allow future knobs without breaking callers.
Functions
-
computeIdf(
List< List< corpus) → Map<String> >String, double> -
Computes inverse document frequency for every term across
corpus. -
extractKeyphrases(
String doc, List< List< corpus, [KeyphraseOptions options = const KeyphraseOptions()]) → List<String> >Keyphrase> -
Extracts the top-K tf*idf keyphrases from
docagainstcorpus. -
termFrequencies(
List< String> tokens) → Map<String, int> -
Counts how many times each token appears in
tokens(term frequency). -
tokenizeKeyphrases(
String text) → List< String> -
Splits
textinto lowercase alphanumeric tokens, dropping stopwords.