normalizeWordCaseCoherence function
Normalizes stray case-flipped characters within case-consistent words.
In words of 3+ letters where all but one character share the same case, the outlier is corrected to match. This handles common OCR errors where a single character is recognized in the wrong case.
Implementation
String normalizeWordCaseCoherence(String line) {
return line.replaceAllMapped(RegExp(r'[A-Za-z]+'), (match) {
final String word = match.group(0)!;
if (word.length < _minWordLengthForCaseCoherence) {
return word;
}
int upper = 0;
int lower = 0;
for (int i = 0; i < word.length; i++) {
final int code = word.codeUnitAt(i);
if (isUpper(code)) {
upper++;
} else if (isLower(code)) {
lower++;
}
}
final int total = upper + lower;
if (total < _minWordLengthForCaseCoherence) {
return word;
}
// If one case is strongly dominant, normalize the word
if (total >= _minWordLengthForCaseCoherence) {
if (upper >= total - 1 && upper > lower) {
return word.toUpperCase();
}
if (lower >= total - 1 && lower > upper) {
// Preserve Title Case
if (isUpper(word.codeUnitAt(0))) {
return sentenceCase(word.toLowerCase());
}
return word.toLowerCase();
}
}
return word;
});
}