LOOKUP

The LOOKUP function searches for a lookup value within a lookup array and returns an Excel Data Type containing the corresponding value from a return array, along with the similarity score and the matched text.

This function utilizes the jellyfish library to calculate similarity scores. For distance-based algorithms (Levenshtein, Damerau-Levenshtein, Hamming), the result is normalized to a similarity score between 0 and 1 using the formula: 1 - \frac{\text{distance}}{\max(\text{length}_1, \text{length}_2)}.

Supported algorithms include Levenshtein, Damerau-Levenshtein, Jaro, Jaro-Winkler, and Hamming.

Excel Usage

=LOOKUP(lookup_value, lookup_array, return_array, threshold, fuzzy_algo)

lookup_value (list[list], required): 2D range of text values to search for.
lookup_array (list[list], required): 2D range of text values to search within.
return_array (list[list], required): 2D range of values to return from.
threshold (float, optional, default: 0.6): Minimum similarity score (0 to 1) required for a match.
fuzzy_algo (str, optional, default: “levenshtein”): The string comparison algorithm to use.

Returns (list[list]): 2D list of Data Types containing the lookup result, similarity score, and matched text. Returns “no match” if no result is found above the threshold.

Example 1: Levenshtein lookup

Inputs:

lookup_value	lookup_array	return_array	threshold	fuzzy_algo
appl	apple	1.5	0.7	levenshtein
	banana	0.5

Excel formula:

=LOOKUP({"appl"}, {"apple";"banana"}, {1.5;0.5}, 0.7, "levenshtein")

Expected output:

{"type":"Double","basicValue":1.5,"properties":{"Result":{"type":"Double","basicValue":1.5},"Score":{"type":"Double","basicValue":0.8},"Match":{"type":"String","basicValue":"apple"}}}

Example 2: Jaro-Winkler lookup

Inputs:

lookup_value	lookup_array	return_array	threshold	fuzzy_algo
dixon	dicksonx	User A	0.8	jaro_winkler
	dixon	User B

Excel formula:

=LOOKUP({"dixon"}, {"dicksonx";"dixon"}, {"User A";"User B"}, 0.8, "jaro_winkler")

Expected output:

{"type":"String","basicValue":"User B","properties":{"Result":{"type":"String","basicValue":"User B"},"Score":{"type":"Double","basicValue":1},"Match":{"type":"String","basicValue":"dixon"}}}

Example 3: Multiple lookup values

Inputs:

lookup_value	lookup_array	return_array	threshold	fuzzy_algo
appl	apple	Fruit 1	0.7	jaro_winkler
bana	banana	Fruit 2
	orange	Fruit 3

Excel formula:

=LOOKUP({"appl";"bana"}, {"apple";"banana";"orange"}, {"Fruit 1";"Fruit 2";"Fruit 3"}, 0.7, "jaro_winkler")

Expected output:

Result
[object Object]
[object Object]

Example 4: No match found

Inputs:

lookup_value	lookup_array	return_array	threshold	fuzzy_algo
xyz	apple	1	0.8	levenshtein
	banana	2

Excel formula:

=LOOKUP({"xyz"}, {"apple";"banana"}, {1;2}, 0.8, "levenshtein")

Expected output:

{"type":"String","basicValue":"no match","properties":{"Result":{"type":"String","basicValue":""},"Score":{"type":"String","basicValue":""},"Match":{"type":"String","basicValue":""}}}

Python Code

external_packages = ['jellyfish']

import jellyfish

def lookup(lookup_value, lookup_array, return_array, threshold=0.6, fuzzy_algo='levenshtein'):
    """
    Returns the value from a return array corresponding to the best fuzzy match.

    See: https://github.com/jamesturk/jellyfish

    This example function is provided as-is without any representation of accuracy.

    Args:
        lookup_value (list[list]): 2D range of text values to search for.
        lookup_array (list[list]): 2D range of text values to search within.
        return_array (list[list]): 2D range of values to return from.
        threshold (float, optional): Minimum similarity score (0 to 1) required for a match. Default is 0.6.
        fuzzy_algo (str, optional): The string comparison algorithm to use. Valid options: Levenshtein, Damerau-Levenshtein, Jaro, Jaro-Winkler, Hamming. Default is 'levenshtein'.

    Returns:
        list[list]: 2D list of Data Types containing the lookup result, similarity score, and matched text. Returns "no match" if no result is found above the threshold.
    """
    try:
        def to2d(x):
            if x is None: return [[""]]
            return [[x]] if not isinstance(x, list) else x

        def get_dt_type(val):
            if isinstance(val, bool): return "Boolean"
            if isinstance(val, (int, float)): return "Double"
            return "String"

        def get_similarity(s1, s2, algo):
            s1, s2 = str(s1), str(s2)
            if algo == "levenshtein":
                dist = jellyfish.levenshtein_distance(s1, s2)
                max_len = max(len(s1), len(s2))
                return 1.0 - (dist / max_len) if max_len > 0 else 1.0
            elif algo == "damerau_levenshtein":
                dist = jellyfish.damerau_levenshtein_distance(s1, s2)
                max_len = max(len(s1), len(s2))
                return 1.0 - (dist / max_len) if max_len > 0 else 1.0
            elif algo == "jaro":
                return jellyfish.jaro_similarity(s1, s2)
            elif algo == "jaro_winkler":
                return jellyfish.jaro_winkler_similarity(s1, s2)
            elif algo == "hamming":
                if len(s1) != len(s2):
                    return 0.0
                dist = jellyfish.hamming_distance(s1, s2)
                return 1.0 - (dist / len(s1)) if len(s1) > 0 else 1.0
            return 0.0

        needles = to2d(lookup_value)

        if not lookup_array or not return_array or len(lookup_array) != len(return_array) or len(lookup_array[0]) != len(return_array[0]):
            return "Error: lookup_array and return_array must have the same dimensions."

        haystack_flat = [item for sublist in lookup_array for item in sublist]
        return_flat = [item for sublist in return_array for item in sublist]

        results = []
        for row in needles:
            res_row = []
            for needle in row:
                if needle is None or str(needle).strip() == "":
                    res_row.append("")
                    continue

                best_idx = -1
                max_score = -1.0
                matched_text = ""

                for i, item in enumerate(haystack_flat):
                    if item is None:
                        continue

                    try:
                        score = float(get_similarity(needle, item, fuzzy_algo))
                        if score >= threshold and score > max_score:
                            max_score = score
                            best_idx = i
                            matched_text = str(item)
                    except Exception:
                        continue

                if best_idx != -1:
                    val = return_flat[best_idx]
                    v_type = get_dt_type(val)
                    res_row.append({
                        "type": v_type,
                        "basicValue": val,
                        "properties": {
                            "Result": {"type": v_type, "basicValue": val},
                            "Score": {"type": "Double", "basicValue": float(max_score)},
                            "Match": {"type": "String", "basicValue": matched_text}
                        }
                    })
                else:
                    res_row.append({
                        "type": "String",
                        "basicValue": "no match",
                        "properties": {
                            "Result": {"type": "String", "basicValue": ""},
                            "Score": {"type": "String", "basicValue": ""},
                            "Match": {"type": "String", "basicValue": ""}
                        }
                    })
            results.append(res_row)

        return results
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

lookup_value *

2D range of text values to search for.

lookup_array *

2D range of text values to search within.

return_array *

2D range of values to return from.

threshold

Minimum similarity score (0 to 1) required for a match.

fuzzy_algo

The string comparison algorithm to use.