tverskyIndexOf function

double tverskyIndexOf(
  1. String source,
  2. String target, {
  3. int ngram = 1,
  4. double alpha = 0.5,
  5. double beta = 0.5,
  6. bool ignoreCase = false,
  7. bool ignoreWhitespace = false,
  8. bool ignoreNumbers = false,
  9. bool alphaNumericOnly = false,
})

Finds the Tversky similarity index between two strings.

Parameters

  • source is the variant string
  • target is the prototype string
  • alpha is the variant coefficient. Default is 0.5
  • beta is the prototype coefficient. Default is 0.5
  • if ignoreCase is true, the character case shall be ignored.
  • if ignoreWhitespace is true, space, tab, newlines etc whitespace characters will be ignored.
  • if ignoreNumbers is true, numbers will be ignored.
  • if alphaNumericOnly is true, only letters and digits will be matched.
  • ngram is the size a single item group. If n = 1, each individual items are considered separately. If n = 2, two consecutive items are grouped together and treated as one.

TIPS: You can pass both ignoreNumbers and alphaNumericOnly to true to ignore everything else except letters.

Details

Tversky index is an asymmetric similarity measure between sets that compares a variant with a prototype. It is a generalization of the Sørensen–Dice coefficient and Jaccard index.

It may return NaN dependending on the values of alpha and beta.

See Also: tverskyIndex


Complexity: Time O(n log n) | Space O(n)

Implementation

double tverskyIndexOf(
  String source,
  String target, {
  int ngram = 1,
  double alpha = 0.5,
  double beta = 0.5,
  bool ignoreCase = false,
  bool ignoreWhitespace = false,
  bool ignoreNumbers = false,
  bool alphaNumericOnly = false,
}) {
  source = cleanupString(
    source,
    ignoreCase: ignoreCase,
    ignoreWhitespace: ignoreWhitespace,
    ignoreNumbers: ignoreNumbers,
    alphaNumericOnly: alphaNumericOnly,
  );
  target = cleanupString(
    target,
    ignoreCase: ignoreCase,
    ignoreWhitespace: ignoreWhitespace,
    ignoreNumbers: ignoreNumbers,
    alphaNumericOnly: alphaNumericOnly,
  );
  if (ngram < 2) {
    return tverskyIndex(
      source.codeUnits,
      target.codeUnits,
      alpha: alpha,
      beta: beta,
    );
  } else {
    return tverskyIndex(
      splitStringToSet(source, ngram),
      splitStringToSet(target, ngram),
      alpha: alpha,
      beta: beta,
    );
  }
}