edit_distance library

Edit distances algorithms for fuzzy matching

Classes

CombinedJaccard
Combines multiple Jaccard normalized edit distance of N=1, N=2, N=3, ... The individual distances are weighed by N^2.
Damerau
Implementation of Damerau-Levenshtein distance with transposition (also sometimes calls unrestricted Damerau-Levenshtein distance). It is the minimum number of operations needed to transform one string into the other, where an operation is defined as an insertion, deletion, or substitution of a single character, or a transposition of two adjacent characters.
Jaccard
The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets.
JaroWinkler
The Jaro–Winkler distance metric is designed and best suited for short strings such as person names, and to detect typos; it is (roughly) a variation of Damerau-Levenshtein, where the substitution of 2 close characters is considered less important then the substitution of 2 characters that a far from each other.
Levenshtein
The Levenshtein distance, or edit distance, between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.
LongestCommonSubsequence
The longest common subsequence (LCS) problem consists in finding the longest subsequence common to two (or more) sequences. It differs from problems of finding common substrings: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences.
NormalizedStringDistance
StringDistance

Constants

defaultThreshold → const double
jwCoef → const double