languageTrigrams function
Extracts ranked trigrams from text, most frequent first.
Lowercases and collapses runs of whitespace so casing and spacing do not
fragment the profile; spaces are kept as trigram characters because word
boundaries carry strong language signal (e.g. 'the' vs ' th').
Example:
languageTrigrams('the the'); // ['the', 'he ', 'e t', ...]
Audited: 2026-06-12 11:26 EDT
Implementation
List<String> languageTrigrams(String text) {
final String clean = text.toLowerCase().replaceAll(RegExp(r'\s+'), ' ').trim();
if (clean.length < _kMinTrigrams) return <String>[];
// Count every length-3 window, then rank by descending frequency so the most
// characteristic trigrams dominate the out-of-place comparison.
final Map<String, int> counts = <String, int>{};
for (int i = 0; i + 3 <= clean.length; i++) {
final String gram = clean.substring(i, i + 3);
counts[gram] = (counts[gram] ?? 0) + 1;
}
// Read counts with ?? 0 so the sort never uses a bare ! on the map lookup;
// every key came from the same map, so 0 is an unreachable but safe fallback.
return counts.keys.toList()..sort((a, b) => (counts[b] ?? 0).compareTo(counts[a] ?? 0));
}