diceIndexOf function

double diceIndexOf(
  1. String source,
  2. String target, {
  3. int ngram = 1,
  4. bool ignoreCase = false,
  5. bool ignoreWhitespace = false,
  6. bool ignoreNumbers = false,
  7. bool alphaNumericOnly = false,
})

Finds the Sørensen–Dice coefficient of two strings.

Parameters

  • source is the variant string
  • target is the prototype string
  • if ignoreCase is true, the character case shall be ignored.
  • if ignoreWhitespace is true, space, tab, newlines etc whitespace characters will be ignored.
  • if ignoreNumbers is true, numbers will be ignored.
  • if alphaNumericOnly is true, only letters and digits will be matched.
  • ngram is the size a single item group. If n = 1, each individual items are considered separately. If n = 2, two consecutive items are grouped together and treated as one.

TIPS: You can pass both ignoreNumbers and alphaNumericOnly to true to ignore everything else except letters.

Details

Sørensen–Dice coefficient is a metric used to measure similarity between two samples. This is known by several other names:

  • Sørensen index
  • Dice's coefficient
  • Dice similarity coefficient (DSC)

Tversky index is a generalization of Dice index when alpha = 0.5, and beta = 0.5

See Also: tverskyIndex, diceIndex


Complexity: Time O(n log n) | Space O(n)

Implementation

double diceIndexOf(
  String source,
  String target, {
  int ngram = 1,
  bool ignoreCase = false,
  bool ignoreWhitespace = false,
  bool ignoreNumbers = false,
  bool alphaNumericOnly = false,
}) {
  return tverskyIndexOf(
    source,
    target,
    alpha: 0.5,
    beta: 0.5,
    ngram: ngram,
    ignoreCase: ignoreCase,
    ignoreWhitespace: ignoreWhitespace,
    ignoreNumbers: ignoreNumbers,
    alphaNumericOnly: alphaNumericOnly,
  );
}