correction library
This library is part of the Textify package. Provides text correction utilities for improving OCR results through dictionary matching and character substitution algorithms.
Classes
- CharacterStats
- Utility class to analyze character statistics in text.
Functions
-
applyCorrection(
String inputParagraph, bool applyDictionary) → String - Applies dictionary-based correction to the extracted text.
-
applyDictionaryCorrectionOnSingleSentence(
String inputSentence, Map< String, List< correctionLetters) → StringString> > -
Applies dictionary-based correction to
inputSentence. It first tries to match words directly in the dictionary, then attempts to substitute commonly confused characterscorrectionLetters, and finally finds the closest match in the dictionary if no direct match is found. The original casing of the input words is preserved in the corrected output. -
digitCorrection(
String input) → String -
This function replaces problematic characters in the
inputstring with their digit representations, but only if the single text is mostly composed of digits. -
findClosestMatchingWordInDictionary(
String word) → String - Finds the closest matching word in the dictionary for a given word.
-
findClosestWord(
Set< String> dictionary, String word) → String -
Finds the closest matching word in a
dictionaryfor a given inputword. -
isDigit(
String char) → bool - Checks whether the given string is a digit from 0 to 9.
-
isLetter(
String character) → bool - Checks whether the given character is a letter.
-
isUpperCase(
String str) → bool - Checks whether the given string is all uppercase.
-
levenshteinDistance(
String s1, String s2) → int - Calculates the Levenshtein distance between two strings.
-
normalizeCasingOfParagraph(
String input) → String - Normalizes the casing of the input string by processing each sentence.
-
normalizeCasingOfSentence(
String sentence) → String - Processes a sentence and applies appropriate casing rules.
-
replaceBadDigitsKeepCasing(
String word) → String - Replaces zeros with the letter 'O' in words that are mostly letters.
-
sentenceFixZeroAnO(
String inputSentence) → String - Processes text to correct common OCR errors, focusing on zero/letter 'O' confusion.