post_process_helpers library

Shared utility functions and constants for OCR post-processing passes.

Functions

hasCodeLikeToken(String value) bool
Returns true when any whitespace-delimited token mixes letters and digits.
isAcronym(String token) bool
Returns true if the token is likely an acronym (all caps).
isAlphaWord(String value) bool
Returns true when value contains only ASCII letters.
isDigit(int code) bool
Returns true when code is an ASCII digit.
isLetter(int code) bool
Returns true when code is an ASCII letter.
isLower(int code) bool
Returns true when code is an ASCII lowercase letter.
isMixedCase(String token) bool
Returns true if the token is mixed-case (e.g., 'OpenAI').
isTitleCaseWord(String word) bool
Returns true when word follows strict ASCII title-case.
isUpper(int code) bool
Returns true when code is an ASCII uppercase letter.
sentenceCase(String line) String
Converts the first alphabetic character in a line to uppercase.
shouldPreserveLongLowercaseProse(String value, {required int minTokens, required int minLetters}) bool
Returns true for long already-lowercase prose that should stay lowercase.
toTitleCaseWord(String word) String
Converts word to strict ASCII title-case.