extractKeyphrases function

List<Keyphrase> extractKeyphrases(
  1. String doc,
  2. List<List<String>> corpus, [
  3. KeyphraseOptions options = const KeyphraseOptions()
])

Extracts the top-K tf*idf keyphrases from doc against corpus.

corpus supplies idf weights; pass [doc] alone for single-document mode (idf stays positive via smoothing). Ties break by descending score then ascending phrase, giving a stable, deterministic order. Returns empty for an empty document.

Example:

extractKeyphrases('cat cat dog', <List<String>>[]); // [(phrase: 'cat', ...)]

Audited: 2026-06-12 11:26 EDT

Implementation

List<Keyphrase> extractKeyphrases(
  String doc,
  List<List<String>> corpus, [
  KeyphraseOptions options = const KeyphraseOptions(),
]) {
  final List<String> tokens = tokenizeKeyphrases(doc);
  if (tokens.isEmpty) return <Keyphrase>[];
  // The target doc must influence its own idf; without it, a corpus that never
  // saw these terms would leave them all with the same df and flatten scores.
  final List<List<String>> effective = <List<String>>[...corpus, tokens];
  final Map<String, double> idf = computeIdf(effective);
  final Map<String, double> scores = _scoreTerms(tokens, idf, options);
  return _rankTopK(scores, options.topK);
}