correction library

This library is part of the Textify package. Provides text correction utilities for improving OCR results through dictionary matching and character substitution algorithms.

Classes

CharacterStats
Utility class to analyze character statistics in text.

Functions

applyCorrection(String inputParagraph, bool applyDictionary) String
Applies dictionary-based correction to the extracted text.
applyDictionaryCorrectionOnSingleSentence(String inputSentence, Map<String, List<String>> correctionLetters) String
Applies dictionary-based correction to inputSentence. It first tries to match words directly in the dictionary, then attempts to substitute commonly confused characters correctionLetters, and finally finds the closest match in the dictionary if no direct match is found. The original casing of the input words is preserved in the corrected output.
digitCorrection(String input) String
This function replaces problematic characters in the input string with their digit representations, but only if the single text is mostly composed of digits.
findClosestMatchingWordInDictionary(String word) String
Finds the closest matching word in the dictionary for a given word.
findClosestWord(Set<String> dictionary, String word) String
Finds the closest matching word in a dictionary for a given input word.
isDigit(String char) bool
Checks whether the given string is a digit from 0 to 9.
isLetter(String character) bool
Checks whether the given character is a letter.
isUpperCase(String str) bool
Checks whether the given string is all uppercase.
levenshteinDistance(String s1, String s2) int
Calculates the Levenshtein distance between two strings.
normalizeCasingOfParagraph(String input) String
Normalizes the casing of the input string by processing each sentence.
normalizeCasingOfSentence(String sentence) String
Processes a sentence and applies appropriate casing rules.
replaceBadDigitsKeepCasing(String word) String
Replaces zeros with the letter 'O' in words that are mostly letters.
sentenceFixZeroAnO(String inputSentence) String
Processes text to correct common OCR errors, focusing on zero/letter 'O' confusion.