computeIdf function

Map<String, double> computeIdf(
  1. List<List<String>> corpus
)

Computes inverse document frequency for every term across corpus.

Uses smoothed idf ln(1 + N / df) so a single-document corpus never divides by zero and every term keeps a positive weight; rarer terms score higher.

Example:

computeIdf([['cat', 'dog'], ['cat']])['cat']; // ln(1 + 2/2) = ln 2

Audited: 2026-06-12 11:26 EDT

Implementation

Map<String, double> computeIdf(List<List<String>> corpus) {
  // Document frequency: number of docs containing each term at least once.
  final Map<String, int> df = <String, int>{};
  for (final List<String> doc in corpus) {
    for (final String term in doc.toSet()) {
      df[term] = (df[term] ?? 0) + 1;
    }
  }
  final int n = corpus.length;
  return df.map(
    (String term, int count) => MapEntry<String, double>(term, math.log(1 + n / count)),
  );
}