jaccardDistanceOf function
Returns the Jaccard distance between two strings.
Parameters
source
is the variant listtarget
is the prototype list- if
ignoreCase
is true, the character case shall be ignored. - if
ignoreWhitespace
is true, space, tab, newlines etc whitespace characters will be ignored. - if
ignoreNumbers
is true, numbers will be ignored. - if
alphaNumericOnly
is true, only letters and digits will be matched. ngram
is the size a single item group. If n = 1, each individual items are considered separately. If n = 2, two consecutive items are grouped together and treated as one.
Details
Jaccard distance measures the total number of characters that is present in
one string but not the other. It is calculated by subtracting the length of
intersection between the source
and target
set from their union.
See Also: tverskyIndex, jaccardIndex
Complexity: Time O(n log n)
| Space O(n)
Implementation
int jaccardDistanceOf(
String source,
String target, {
int ngram = 1,
bool ignoreCase = false,
bool ignoreWhitespace = false,
bool ignoreNumbers = false,
bool alphaNumericOnly = false,
}) {
source = cleanupString(
source,
ignoreCase: ignoreCase,
ignoreWhitespace: ignoreWhitespace,
ignoreNumbers: ignoreNumbers,
alphaNumericOnly: alphaNumericOnly,
);
target = cleanupString(
target,
ignoreCase: ignoreCase,
ignoreWhitespace: ignoreWhitespace,
ignoreNumbers: ignoreNumbers,
alphaNumericOnly: alphaNumericOnly,
);
if (ngram < 2) {
return jaccardDistance(
source.codeUnits,
target.codeUnits,
);
} else {
return jaccardDistance(
splitStringToSet(source, ngram),
splitStringToSet(target, ngram),
);
}
}