normalizeWordCaseCoherence function

String normalizeWordCaseCoherence(
  1. String line
)

Normalizes stray case-flipped characters within case-consistent words.

In words of 3+ letters where all but one character share the same case, the outlier is corrected to match. This handles common OCR errors where a single character is recognized in the wrong case.

Implementation

String normalizeWordCaseCoherence(String line) {
  return line.replaceAllMapped(RegExp(r'[A-Za-z]+'), (match) {
    final String word = match.group(0)!;
    if (word.length < _minWordLengthForCaseCoherence) {
      return word;
    }

    int upper = 0;
    int lower = 0;
    for (int i = 0; i < word.length; i++) {
      final int code = word.codeUnitAt(i);
      if (isUpper(code)) {
        upper++;
      } else if (isLower(code)) {
        lower++;
      }
    }

    final int total = upper + lower;
    if (total < _minWordLengthForCaseCoherence) {
      return word;
    }

    // If one case is strongly dominant, normalize the word
    if (total >= _minWordLengthForCaseCoherence) {
      if (upper >= total - 1 && upper > lower) {
        return word.toUpperCase();
      }
      if (lower >= total - 1 && lower > upper) {
        // Preserve Title Case
        if (isUpper(word.codeUnitAt(0))) {
          return sentenceCase(word.toLowerCase());
        }
        return word.toLowerCase();
      }
    }

    return word;
  });
}