LOOKUP

The LOOKUP function searches for a lookup value within a lookup array and returns an Excel Data Type containing the corresponding value from a return array, along with the similarity score and the matched text.

This function utilizes the jellyfish library to calculate similarity scores. For distance-based algorithms (Levenshtein, Damerau-Levenshtein, Hamming), the result is normalized to a similarity score between 0 and 1 using the formula: 1 - \frac{\text{distance}}{\max(\text{length}_1, \text{length}_2)}.

Supported algorithms include Levenshtein, Damerau-Levenshtein, Jaro, Jaro-Winkler, and Hamming.

Excel Usage

=LOOKUP(lookup_value, lookup_array, return_array, threshold, fuzzy_algo)
  • lookup_value (list[list], required): 2D range of text values to search for.
  • lookup_array (list[list], required): 2D range of text values to search within.
  • return_array (list[list], required): 2D range of values to return from.
  • threshold (float, optional, default: 0.6): Minimum similarity score (0 to 1) required for a match.
  • fuzzy_algo (str, optional, default: “levenshtein”): The string comparison algorithm to use.

Returns (list[list]): 2D list of Data Types containing the lookup result, similarity score, and matched text. Returns “no match” if no result is found above the threshold.

Example 1: Levenshtein lookup

Inputs:

lookup_value lookup_array return_array threshold fuzzy_algo
appl apple 1.5 0.7 levenshtein
banana 0.5

Excel formula:

=LOOKUP({"appl"}, {"apple";"banana"}, {1.5;0.5}, 0.7, "levenshtein")

Expected output:

{"type":"Double","basicValue":1.5,"properties":{"Result":{"type":"Double","basicValue":1.5},"Score":{"type":"Double","basicValue":0.8},"Match":{"type":"String","basicValue":"apple"}}}

Example 2: Jaro-Winkler lookup

Inputs:

lookup_value lookup_array return_array threshold fuzzy_algo
dixon dicksonx User A 0.8 jaro_winkler
dixon User B

Excel formula:

=LOOKUP({"dixon"}, {"dicksonx";"dixon"}, {"User A";"User B"}, 0.8, "jaro_winkler")

Expected output:

{"type":"String","basicValue":"User B","properties":{"Result":{"type":"String","basicValue":"User B"},"Score":{"type":"Double","basicValue":1},"Match":{"type":"String","basicValue":"dixon"}}}

Example 3: Multiple lookup values

Inputs:

lookup_value lookup_array return_array threshold fuzzy_algo
appl apple Fruit 1 0.7 jaro_winkler
bana banana Fruit 2
orange Fruit 3

Excel formula:

=LOOKUP({"appl";"bana"}, {"apple";"banana";"orange"}, {"Fruit 1";"Fruit 2";"Fruit 3"}, 0.7, "jaro_winkler")

Expected output:

Result
[object Object]
[object Object]
Example 4: No match found

Inputs:

lookup_value lookup_array return_array threshold fuzzy_algo
xyz apple 1 0.8 levenshtein
banana 2

Excel formula:

=LOOKUP({"xyz"}, {"apple";"banana"}, {1;2}, 0.8, "levenshtein")

Expected output:

{"type":"String","basicValue":"no match","properties":{"Result":{"type":"String","basicValue":""},"Score":{"type":"String","basicValue":""},"Match":{"type":"String","basicValue":""}}}

Python Code

Show Code
import jellyfish

def lookup(lookup_value, lookup_array, return_array, threshold=0.6, fuzzy_algo='levenshtein'):
    """
    Returns the value from a return array corresponding to the best fuzzy match.

    See: https://github.com/jamesturk/jellyfish

    This example function is provided as-is without any representation of accuracy.

    Args:
        lookup_value (list[list]): 2D range of text values to search for.
        lookup_array (list[list]): 2D range of text values to search within.
        return_array (list[list]): 2D range of values to return from.
        threshold (float, optional): Minimum similarity score (0 to 1) required for a match. Default is 0.6.
        fuzzy_algo (str, optional): The string comparison algorithm to use. Valid options: Levenshtein, Damerau-Levenshtein, Jaro, Jaro-Winkler, Hamming. Default is 'levenshtein'.

    Returns:
        list[list]: 2D list of Data Types containing the lookup result, similarity score, and matched text. Returns "no match" if no result is found above the threshold.
    """
    try:
        def to2d(x):
            if x is None: return [[""]]
            return [[x]] if not isinstance(x, list) else x

        def get_dt_type(val):
            if isinstance(val, bool): return "Boolean"
            if isinstance(val, (int, float)): return "Double"
            return "String"

        def get_similarity(s1, s2, algo):
            s1, s2 = str(s1), str(s2)
            if algo == "levenshtein":
                dist = jellyfish.levenshtein_distance(s1, s2)
                max_len = max(len(s1), len(s2))
                return 1.0 - (dist / max_len) if max_len > 0 else 1.0
            elif algo == "damerau_levenshtein":
                dist = jellyfish.damerau_levenshtein_distance(s1, s2)
                max_len = max(len(s1), len(s2))
                return 1.0 - (dist / max_len) if max_len > 0 else 1.0
            elif algo == "jaro":
                return jellyfish.jaro_similarity(s1, s2)
            elif algo == "jaro_winkler":
                return jellyfish.jaro_winkler_similarity(s1, s2)
            elif algo == "hamming":
                if len(s1) != len(s2):
                    return 0.0
                dist = jellyfish.hamming_distance(s1, s2)
                return 1.0 - (dist / len(s1)) if len(s1) > 0 else 1.0
            return 0.0

        needles = to2d(lookup_value)

        if not lookup_array or not return_array or len(lookup_array) != len(return_array) or len(lookup_array[0]) != len(return_array[0]):
            return "Error: lookup_array and return_array must have the same dimensions."

        haystack_flat = [item for sublist in lookup_array for item in sublist]
        return_flat = [item for sublist in return_array for item in sublist]

        results = []
        for row in needles:
            res_row = []
            for needle in row:
                if needle is None or str(needle).strip() == "":
                    res_row.append("")
                    continue

                best_idx = -1
                max_score = -1.0
                matched_text = ""

                for i, item in enumerate(haystack_flat):
                    if item is None:
                        continue

                    try:
                        score = float(get_similarity(needle, item, fuzzy_algo))
                        if score >= threshold and score > max_score:
                            max_score = score
                            best_idx = i
                            matched_text = str(item)
                    except Exception:
                        continue

                if best_idx != -1:
                    val = return_flat[best_idx]
                    v_type = get_dt_type(val)
                    res_row.append({
                        "type": v_type,
                        "basicValue": val,
                        "properties": {
                            "Result": {"type": v_type, "basicValue": val},
                            "Score": {"type": "Double", "basicValue": float(max_score)},
                            "Match": {"type": "String", "basicValue": matched_text}
                        }
                    })
                else:
                    res_row.append({
                        "type": "String",
                        "basicValue": "no match",
                        "properties": {
                            "Result": {"type": "String", "basicValue": ""},
                            "Score": {"type": "String", "basicValue": ""},
                            "Match": {"type": "String", "basicValue": ""}
                        }
                    })
            results.append(res_row)

        return results
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

2D range of text values to search for.
2D range of text values to search within.
2D range of values to return from.
Minimum similarity score (0 to 1) required for a match.
The string comparison algorithm to use.