languageTrigrams function - language_detect_utils library

languageTrigrams function

List<String> languageTrigrams(

String text

)

Extracts ranked trigrams from text, most frequent first.

Lowercases and collapses runs of whitespace so casing and spacing do not fragment the profile; spaces are kept as trigram characters because word boundaries carry strong language signal (e.g. 'the' vs ' th').

Example:

languageTrigrams('the the'); // ['the', 'he ', 'e t', ...]

Audited: 2026-06-12 11:26 EDT

Implementation

List<String> languageTrigrams(String text) {
  final String clean = text.toLowerCase().replaceAll(RegExp(r'\s+'), ' ').trim();
  if (clean.length < _kMinTrigrams) return <String>[];
  // Count every length-3 window, then rank by descending frequency so the most
  // characteristic trigrams dominate the out-of-place comparison.
  final Map<String, int> counts = <String, int>{};
  for (int i = 0; i + 3 <= clean.length; i++) {
    final String gram = clean.substring(i, i + 3);
    counts[gram] = (counts[gram] ?? 0) + 1;
  }
  // Read counts with ?? 0 so the sort never uses a bare ! on the map lookup;
  // every key came from the same map, so 0 is an unreachable but safe fallback.
  return counts.keys.toList()..sort((a, b) => (counts[b] ?? 0).compareTo(counts[a] ?? 0));
}