post_process_case library
Case normalization passes for OCR post-processing.
Handles I/L ambiguity resolution, word-level case coherence, line-level case normalization, and name-like title-case formatting.
Functions
-
normalizeCodeLikeTokens(
String line) → String - Normalizes mixed alphanumeric code-like tokens without touching prose.
-
normalizeLineCase(
String line) → String - Normalizes dominant line casing while preserving mixed-case lines.
-
normalizeNameLikeLineTitleCase(
String line) → String - Applies title-case to all words in name-like lines.
-
normalizeShortUppercaseDictionaryWords(
String line) → String - Lowercases short all-caps dictionary words inside sentence-like lines.
-
normalizeStructuredFieldLine(
String line, {required bool applyDictionary}) → String -
Normalizes simple structured field lines such as
Name: john smith. -
normalizeWordCaseCoherence(
String line) → String - Normalizes stray case-flipped characters within case-consistent words.
-
resolveILAmbiguity(
String line) → String - Resolves I/l ambiguity based on word-level case context.