MATCH

The MATCH function searches for a lookup value within a lookup array and returns an Excel Data Type containing the 1-based position of the best match, along with the similarity score, raw distance (for edit-based algorithms), and the matched text.

This function utilizes the jellyfish library to calculate similarity scores. For distance-based algorithms (Levenshtein, Damerau-Levenshtein, Hamming), the result is normalized to a similarity score between 0 and 1 using the formula: 1 - \frac{\text{distance}}{\max(\text{length}_1, \text{length}_2)}.

Supported algorithms include Levenshtein, Damerau-Levenshtein, Jaro, Jaro-Winkler, and Hamming.

Excel Usage

=MATCH(lookup_value, lookup_array, threshold, fuzzy_algo)
  • lookup_value (list[list], required): 2D range of text values to search for.
  • lookup_array (list[list], required): 2D range of text values to search within.
  • threshold (float, optional, default: 0.6): Minimum similarity score (0 to 1) required for a match.
  • fuzzy_algo (str, optional, default: “levenshtein”): The string comparison algorithm to use.

Returns (list[list]): 2D list of Data Types containing the 1-based index, similarity score, and distance (if applicable) of the best match. Returns “no match” if no result is found above the threshold.

Example 1: Levenshtein match

Inputs:

lookup_value lookup_array threshold fuzzy_algo
appl apple 0.7 levenshtein
banana

Excel formula:

=MATCH({"appl"}, {"apple";"banana"}, 0.7, "levenshtein")

Expected output:

{"type":"Double","basicValue":1,"properties":{"Index":{"type":"Double","basicValue":1},"Score":{"type":"Double","basicValue":0.8},"Match":{"type":"String","basicValue":"apple"},"Distance":{"type":"Double","basicValue":1}}}

Example 2: Jaro-Winkler match

Inputs:

lookup_value lookup_array threshold fuzzy_algo
dixon dicksonx 0.8 jaro_winkler
dixon

Excel formula:

=MATCH({"dixon"}, {"dicksonx";"dixon"}, 0.8, "jaro_winkler")

Expected output:

{"type":"Double","basicValue":2,"properties":{"Index":{"type":"Double","basicValue":2},"Score":{"type":"Double","basicValue":1},"Match":{"type":"String","basicValue":"dixon"}}}

Example 3: Multiple lookup values

Inputs:

lookup_value lookup_array threshold fuzzy_algo
appl apple 0.7 levenshtein
bana banana
orange

Excel formula:

=MATCH({"appl";"bana"}, {"apple";"banana";"orange"}, 0.7, "levenshtein")

Expected output:

Result
[object Object]
[object Object]
Example 4: No match found

Inputs:

lookup_value lookup_array threshold fuzzy_algo
xyz apple 0.8 levenshtein
banana

Excel formula:

=MATCH({"xyz"}, {"apple";"banana"}, 0.8, "levenshtein")

Expected output:

{"type":"String","basicValue":"no match","properties":{"Index":{"type":"String","basicValue":""},"Score":{"type":"String","basicValue":""},"Match":{"type":"String","basicValue":""},"Distance":{"type":"String","basicValue":""}}}

Example 5: Hamming match (equal length)

Inputs:

lookup_value lookup_array threshold fuzzy_algo
apple applz 0.7 hamming
banana

Excel formula:

=MATCH({"apple"}, {"applz";"banana"}, 0.7, "hamming")

Expected output:

{"type":"Double","basicValue":1,"properties":{"Index":{"type":"Double","basicValue":1},"Score":{"type":"Double","basicValue":0.8},"Match":{"type":"String","basicValue":"applz"},"Distance":{"type":"Double","basicValue":1}}}

Python Code

Show Code
import jellyfish

def match(lookup_value, lookup_array, threshold=0.6, fuzzy_algo='levenshtein'):
    """
    Returns the index of the best fuzzy match.

    See: https://github.com/jamesturk/jellyfish

    This example function is provided as-is without any representation of accuracy.

    Args:
        lookup_value (list[list]): 2D range of text values to search for.
        lookup_array (list[list]): 2D range of text values to search within.
        threshold (float, optional): Minimum similarity score (0 to 1) required for a match. Default is 0.6.
        fuzzy_algo (str, optional): The string comparison algorithm to use. Valid options: Levenshtein, Damerau-Levenshtein, Jaro, Jaro-Winkler, Hamming. Default is 'levenshtein'.

    Returns:
        list[list]: 2D list of Data Types containing the 1-based index, similarity score, and distance (if applicable) of the best match. Returns "no match" if no result is found above the threshold.
    """
    try:
        def to2d(x):
            if x is None: return [[""]]
            return [[x]] if not isinstance(x, list) else x

        def get_metrics(s1, s2, algo):
            s1, s2 = str(s1), str(s2)
            dist = None
            if algo == "levenshtein":
                dist = jellyfish.levenshtein_distance(s1, s2)
                max_len = max(len(s1), len(s2))
                score = 1.0 - (dist / max_len) if max_len > 0 else 1.0
            elif algo == "damerau_levenshtein":
                dist = jellyfish.damerau_levenshtein_distance(s1, s2)
                max_len = max(len(s1), len(s2))
                score = 1.0 - (dist / max_len) if max_len > 0 else 1.0
            elif algo == "jaro":
                score = jellyfish.jaro_similarity(s1, s2)
            elif algo == "jaro_winkler":
                score = jellyfish.jaro_winkler_similarity(s1, s2)
            elif algo == "hamming":
                if len(s1) != len(s2):
                    return 0.0, None
                dist = jellyfish.hamming_distance(s1, s2)
                score = 1.0 - (dist / len(s1)) if len(s1) > 0 else 1.0
            else:
                score = 0.0
            return score, dist

        needles = to2d(lookup_value)
        haystack_flat = [item for sublist in lookup_array for item in sublist]

        results = []
        for row in needles:
            res_row = []
            for needle in row:
                if needle is None or str(needle).strip() == "":
                    res_row.append("")
                    continue

                best_idx = -1
                max_score = -1.0
                best_dist = None
                matched_text = ""

                for i, item in enumerate(haystack_flat):
                    if item is None:
                        continue

                    try:
                        score, dist = get_metrics(needle, item, fuzzy_algo)
                        if score >= threshold and score > max_score:
                            max_score = score
                            best_dist = dist
                            best_idx = i
                            matched_text = str(item)
                    except Exception:
                        continue

                if best_idx != -1:
                    index = best_idx + 1
                    props = {
                        "Index": {"type": "Double", "basicValue": float(index)},
                        "Score": {"type": "Double", "basicValue": float(max_score)},
                        "Match": {"type": "String", "basicValue": matched_text}
                    }
                    if best_dist is not None:
                        props["Distance"] = {"type": "Double", "basicValue": float(best_dist)}

                    res_row.append({
                        "type": "Double",
                        "basicValue": float(index),
                        "properties": props
                    })
                else:
                    res_row.append({
                        "type": "String",
                        "basicValue": "no match",
                        "properties": {
                            "Index": {"type": "String", "basicValue": ""},
                            "Score": {"type": "String", "basicValue": ""},
                            "Match": {"type": "String", "basicValue": ""},
                            "Distance": {"type": "String", "basicValue": ""}
                        }
                    })
            results.append(res_row)

        return results
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

2D range of text values to search for.
2D range of text values to search within.
Minimum similarity score (0 to 1) required for a match.
The string comparison algorithm to use.