computeIdf function
Computes inverse document frequency for every term across corpus.
Uses smoothed idf ln(1 + N / df) so a single-document corpus never divides
by zero and every term keeps a positive weight; rarer terms score higher.
Example:
computeIdf([['cat', 'dog'], ['cat']])['cat']; // ln(1 + 2/2) = ln 2
Audited: 2026-06-12 11:26 EDT
Implementation
Map<String, double> computeIdf(List<List<String>> corpus) {
// Document frequency: number of docs containing each term at least once.
final Map<String, int> df = <String, int>{};
for (final List<String> doc in corpus) {
for (final String term in doc.toSet()) {
df[term] = (df[term] ?? 0) + 1;
}
}
final int n = corpus.length;
return df.map(
(String term, int count) => MapEntry<String, double>(term, math.log(1 + n / count)),
);
}