extractKeyphrases function
List<Keyphrase>
extractKeyphrases(
- String doc,
- List<
List< corpus, [String> > - KeyphraseOptions options = const KeyphraseOptions()
Extracts the top-K tf*idf keyphrases from doc against corpus.
corpus supplies idf weights; pass [doc] alone for single-document mode
(idf stays positive via smoothing). Ties break by descending score then
ascending phrase, giving a stable, deterministic order. Returns empty for an
empty document.
Example:
extractKeyphrases('cat cat dog', <List<String>>[]); // [(phrase: 'cat', ...)]
Audited: 2026-06-12 11:26 EDT
Implementation
List<Keyphrase> extractKeyphrases(
String doc,
List<List<String>> corpus, [
KeyphraseOptions options = const KeyphraseOptions(),
]) {
final List<String> tokens = tokenizeKeyphrases(doc);
if (tokens.isEmpty) return <Keyphrase>[];
// The target doc must influence its own idf; without it, a corpus that never
// saw these terms would leave them all with the same df and flatten scores.
final List<List<String>> effective = <List<String>>[...corpus, tokens];
final Map<String, double> idf = computeIdf(effective);
final Map<String, double> scores = _scoreTerms(tokens, idf, options);
return _rankTopK(scores, options.topK);
}