jaccardIndexOf function
Returns the Jaccard index between two strings.
Parameters
source
is the variant stringtarget
is the prototype string- if
ignoreCase
is true, the character case shall be ignored. - if
ignoreWhitespace
is true, space, tab, newlines etc whitespace characters will be ignored. - if
ignoreNumbers
is true, numbers will be ignored. - if
alphaNumericOnly
is true, only letters and digits will be matched. ngram
is the size a single item group. If n = 1, each individual items are considered separately. If n = 2, two consecutive items are grouped together and treated as one.
TIPS: You can pass both
ignoreNumbers
andalphaNumericOnly
to true to ignore everything else except letters.
Details
Jaccard index is a metric used to measure similarity between two samples sets. This is known by several other names:
- Jaccard similarity coefficient
- Tanimoto index
- Tanimoto coefficient
Tversky index is a generalization of Jaccard index when alpha = 1, and beta = 1
See Also: tverskyIndex, jaccardIndex
Complexity: Time O(n log n)
| Space O(n)
Implementation
double jaccardIndexOf(
String source,
String target, {
int ngram = 1,
bool ignoreCase = false,
bool ignoreWhitespace = false,
bool ignoreNumbers = false,
bool alphaNumericOnly = false,
}) {
return tverskyIndexOf(
source,
target,
alpha: 1,
beta: 1,
ngram: ngram,
ignoreCase: ignoreCase,
ignoreWhitespace: ignoreWhitespace,
ignoreNumbers: ignoreNumbers,
alphaNumericOnly: alphaNumericOnly,
);
}