edit_distance library
Edit distances algorithms for fuzzy matching
Classes
- CombinedJaccard
- Combines multiple Jaccard normalized edit distance of N=1, N=2, N=3, ... The individual distances are weighed by N^2.
- Damerau
- Implementation of Damerau-Levenshtein distance with transposition (also sometimes calls unrestricted Damerau-Levenshtein distance). It is the minimum number of operations needed to transform one string into the other, where an operation is defined as an insertion, deletion, or substitution of a single character, or a transposition of two adjacent characters.
- Jaccard
- The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets.
- JaroWinkler
- The Jaro–Winkler distance metric is designed and best suited for short strings such as person names, and to detect typos; it is (roughly) a variation of Damerau-Levenshtein, where the substitution of 2 close characters is considered less important then the substitution of 2 characters that a far from each other.
- Levenshtein
- The Levenshtein distance, or edit distance, between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.
- LongestCommonSubsequence
- The longest common subsequence (LCS) problem consists in finding the longest subsequence common to two (or more) sequences. It differs from problems of finding common substrings: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences.
- NormalizedStringDistance
- StringDistance
Constants
- defaultThreshold → const double
- jwCoef → const double