post_process_numeric library
Numeric normalization passes for OCR post-processing.
Handles digit segment corrections, numeric gap repair, and date separator normalization.
Constants
-
digitConfusionMap
→ const Map<
String, String> - Map of letters commonly confused with digits by OCR.
-
digitNonAlnumMap
→ const Map<
String, String> - Map of non-alphanumeric characters commonly confused with digits.
-
highConfidenceDigitLookalikes
→ const Set<
String> - Letters that are high-confidence digit lookalikes — safe to convert to digits even with only one digit-dominant neighbor.
Functions
-
normalizeDateSeparators(
String line) → String - Removes OCR-introduced spaces around date separators and digit clusters.
-
normalizeDigitSegments(
String line) → String - Corrects letter-like confusions inside digit-dominant token segments.
-
normalizeNumericGaps(
String line) → String - Repairs noisy separators and spacing in numeric expressions.
-
normalizeStandaloneDecimalLikeToken(
String line) → String - Normalizes standalone decimal-like tokens made only of digit lookalikes.
-
normalizeStructuredNumericFieldValue(
String line) → String - Normalizes numeric-like values in simple structured field lines.