jaroWinklerSimilarity<E> function

double jaroWinklerSimilarity<E>(
  1. List<E> source,
  2. List<E> target, {
  3. int? maxPrefixSize,
  4. double? prefixScale,
  5. double threshold = 0.7,
})

Find the Jaro-Winkler similarity index between two list of items.

Parameters

  • source and target are two list of items.
  • threshold is the minimum Jaro distance above which the Winkler's increment is to be applied.
  • maxPrefixSize is the maximum prefix length to consider. If absent, the whole matching prefix is considered.
  • prefixScale is a constant scaling factor for how much the score is adjusted upwards for having common prefixes. The length of the considered common prefix is at most 4. If absent, the default prefix scale is used.

Details

The Jaro similarity index between two list of items is the weighted sum of percentage of matched items from each list and transposed items. Winkler increased this measure for matching initial characters.

See also: jaroSimilarity


If n is the length of source and m is the length of target,
Complexity: Time O(nm) | Space O(n+m)

Implementation

double jaroWinklerSimilarity<E>(
  List<E> source,
  List<E> target, {
  int? maxPrefixSize,
  double? prefixScale,
  double threshold = 0.7,
}) {
  double jaro = jaroSimilarity(source, target);

  if (jaro > threshold) {
    // maximum length to find prefix match
    int len = min(source.length, target.length);
    if (maxPrefixSize != null && len > maxPrefixSize) {
      len = maxPrefixSize;
    }

    // Find matching prefix
    int l = 0;
    while (l < len && source[l] == target[l]) {
      l++;
    }

    // Add Winkler bonus with jaro similarity index
    double p = prefixScale ?? min(0.1, 1 / max(source.length, target.length));
    jaro += l * p * (1 - jaro);
  }

  return jaro;
}