post_process_helpers library
Shared utility functions and constants for OCR post-processing passes.
Constants
- asciiCaseOffset → const int
- carriageReturnCodeUnit → const int
- digitNineCodeUnit → const int
- digitZeroCodeUnit → const int
- lineFeedCodeUnit → const int
- lowercaseACodeUnit → const int
- lowercaseZCodeUnit → const int
- minAcronymTokenLength → const int
- minMixedCaseTokenLength → const int
- regexGroupFirst → const int
- regexGroupSecond → const int
- spaceCodeUnit → const int
- tabCodeUnit → const int
- uppercaseACodeUnit → const int
- uppercaseZCodeUnit → const int
Functions
-
hasCodeLikeToken(
String value) → bool - Returns true when any whitespace-delimited token mixes letters and digits.
-
isAcronym(
String token) → bool - Returns true if the token is likely an acronym (all caps).
-
isAlphaWord(
String value) → bool -
Returns true when
valuecontains only ASCII letters. -
isDigit(
int code) → bool -
Returns true when
codeis an ASCII digit. -
isLetter(
int code) → bool -
Returns true when
codeis an ASCII letter. -
isLower(
int code) → bool -
Returns true when
codeis an ASCII lowercase letter. -
isMixedCase(
String token) → bool - Returns true if the token is mixed-case (e.g., 'OpenAI').
-
isTitleCaseWord(
String word) → bool -
Returns true when
wordfollows strict ASCII title-case. -
isUpper(
int code) → bool -
Returns true when
codeis an ASCII uppercase letter. -
sentenceCase(
String line) → String - Converts the first alphabetic character in a line to uppercase.
-
shouldPreserveLongLowercaseProse(
String value, {required int minTokens, required int minLetters}) → bool - Returns true for long already-lowercase prose that should stay lowercase.
-
toTitleCaseWord(
String word) → String -
Converts
wordto strict ASCII title-case.