sentenceFixZeroAnO function
Processes text to correct common OCR errors, focusing on zero/letter 'O' confusion.
This function analyzes each word in the input text to determine whether characters should be interpreted as digits or letters based on context. It specifically handles the common OCR confusion between the digit '0' and the letter 'O'.
The function applies two main corrections:
- For words that appear to be mostly numeric, it converts letter-like characters to digits
- For words that appear to be mostly alphabetic, it converts '0' characters to the letter 'O'
inputSentence is the text string to be processed.
Returns the corrected text with appropriate character substitutions and normalized casing.
Implementation
String sentenceFixZeroAnO(final String inputSentence) {
// Split the input into individual words for processing
List<String> words = inputSentence.split(' ');
for (int i = 0; i < words.length; i++) {
// Remove any newline characters that might be present
String word = words[i].replaceAll('\n', '');
if (word.isNotEmpty) {
// Split on non-alphanumeric boundaries so that mixed tokens like
// "ORD+20250615" are analyzed segment by segment. This prevents
// a mostly-digit suffix from forcing letter→digit conversion on
// an alphabetic prefix.
words[i] = word.splitMapJoin(
RegExp(r'[A-Za-z0-9]+'),
onMatch: (Match m) {
final String segment = m.group(0)!;
final CharacterStats stats = CharacterStats()..inspect(segment);
if (stats.mostlyDigits()) {
return digitCorrection(segment);
}
return replaceBadDigitsKeepCasing(segment);
},
onNonMatch: (String sep) => sep,
);
}
}
// Rejoin the corrected words into a sentence
return words.join(' ');
}