MATCH
The MATCH function searches for a lookup value within a lookup array and returns an Excel Data Type containing the 1-based position of the best match, along with the similarity score, raw distance (for edit-based algorithms), and the matched text.
This function utilizes the jellyfish library to calculate similarity scores. For distance-based algorithms (Levenshtein, Damerau-Levenshtein, Hamming), the result is normalized to a similarity score between 0 and 1 using the formula: 1 - \frac{\text{distance}}{\max(\text{length}_1, \text{length}_2)}.
Supported algorithms include Levenshtein, Damerau-Levenshtein, Jaro, Jaro-Winkler, and Hamming.
Excel Usage
=MATCH(lookup_value, lookup_array, threshold, fuzzy_algo)
lookup_value(list[list], required): 2D range of text values to search for.lookup_array(list[list], required): 2D range of text values to search within.threshold(float, optional, default: 0.6): Minimum similarity score (0 to 1) required for a match.fuzzy_algo(str, optional, default: “levenshtein”): The string comparison algorithm to use.
Returns (list[list]): 2D list of Data Types containing the 1-based index, similarity score, and distance (if applicable) of the best match. Returns “no match” if no result is found above the threshold.
Example 1: Levenshtein match
Inputs:
| lookup_value | lookup_array | threshold | fuzzy_algo |
|---|---|---|---|
| appl | apple | 0.7 | levenshtein |
| banana |
Excel formula:
=MATCH({"appl"}, {"apple";"banana"}, 0.7, "levenshtein")
Expected output:
{"type":"Double","basicValue":1,"properties":{"Index":{"type":"Double","basicValue":1},"Score":{"type":"Double","basicValue":0.8},"Match":{"type":"String","basicValue":"apple"},"Distance":{"type":"Double","basicValue":1}}}
Example 2: Jaro-Winkler match
Inputs:
| lookup_value | lookup_array | threshold | fuzzy_algo |
|---|---|---|---|
| dixon | dicksonx | 0.8 | jaro_winkler |
| dixon |
Excel formula:
=MATCH({"dixon"}, {"dicksonx";"dixon"}, 0.8, "jaro_winkler")
Expected output:
{"type":"Double","basicValue":2,"properties":{"Index":{"type":"Double","basicValue":2},"Score":{"type":"Double","basicValue":1},"Match":{"type":"String","basicValue":"dixon"}}}
Example 3: Multiple lookup values
Inputs:
| lookup_value | lookup_array | threshold | fuzzy_algo |
|---|---|---|---|
| appl | apple | 0.7 | levenshtein |
| bana | banana | ||
| orange |
Excel formula:
=MATCH({"appl";"bana"}, {"apple";"banana";"orange"}, 0.7, "levenshtein")
Expected output:
| Result |
|---|
| [object Object] |
| [object Object] |
Example 4: No match found
Inputs:
| lookup_value | lookup_array | threshold | fuzzy_algo |
|---|---|---|---|
| xyz | apple | 0.8 | levenshtein |
| banana |
Excel formula:
=MATCH({"xyz"}, {"apple";"banana"}, 0.8, "levenshtein")
Expected output:
{"type":"String","basicValue":"no match","properties":{"Index":{"type":"String","basicValue":""},"Score":{"type":"String","basicValue":""},"Match":{"type":"String","basicValue":""},"Distance":{"type":"String","basicValue":""}}}
Example 5: Hamming match (equal length)
Inputs:
| lookup_value | lookup_array | threshold | fuzzy_algo |
|---|---|---|---|
| apple | applz | 0.7 | hamming |
| banana |
Excel formula:
=MATCH({"apple"}, {"applz";"banana"}, 0.7, "hamming")
Expected output:
{"type":"Double","basicValue":1,"properties":{"Index":{"type":"Double","basicValue":1},"Score":{"type":"Double","basicValue":0.8},"Match":{"type":"String","basicValue":"applz"},"Distance":{"type":"Double","basicValue":1}}}
Python Code
Show Code
import jellyfish
def match(lookup_value, lookup_array, threshold=0.6, fuzzy_algo='levenshtein'):
"""
Returns the index of the best fuzzy match.
See: https://github.com/jamesturk/jellyfish
This example function is provided as-is without any representation of accuracy.
Args:
lookup_value (list[list]): 2D range of text values to search for.
lookup_array (list[list]): 2D range of text values to search within.
threshold (float, optional): Minimum similarity score (0 to 1) required for a match. Default is 0.6.
fuzzy_algo (str, optional): The string comparison algorithm to use. Valid options: Levenshtein, Damerau-Levenshtein, Jaro, Jaro-Winkler, Hamming. Default is 'levenshtein'.
Returns:
list[list]: 2D list of Data Types containing the 1-based index, similarity score, and distance (if applicable) of the best match. Returns "no match" if no result is found above the threshold.
"""
try:
def to2d(x):
if x is None: return [[""]]
return [[x]] if not isinstance(x, list) else x
def get_metrics(s1, s2, algo):
s1, s2 = str(s1), str(s2)
dist = None
if algo == "levenshtein":
dist = jellyfish.levenshtein_distance(s1, s2)
max_len = max(len(s1), len(s2))
score = 1.0 - (dist / max_len) if max_len > 0 else 1.0
elif algo == "damerau_levenshtein":
dist = jellyfish.damerau_levenshtein_distance(s1, s2)
max_len = max(len(s1), len(s2))
score = 1.0 - (dist / max_len) if max_len > 0 else 1.0
elif algo == "jaro":
score = jellyfish.jaro_similarity(s1, s2)
elif algo == "jaro_winkler":
score = jellyfish.jaro_winkler_similarity(s1, s2)
elif algo == "hamming":
if len(s1) != len(s2):
return 0.0, None
dist = jellyfish.hamming_distance(s1, s2)
score = 1.0 - (dist / len(s1)) if len(s1) > 0 else 1.0
else:
score = 0.0
return score, dist
needles = to2d(lookup_value)
haystack_flat = [item for sublist in lookup_array for item in sublist]
results = []
for row in needles:
res_row = []
for needle in row:
if needle is None or str(needle).strip() == "":
res_row.append("")
continue
best_idx = -1
max_score = -1.0
best_dist = None
matched_text = ""
for i, item in enumerate(haystack_flat):
if item is None:
continue
try:
score, dist = get_metrics(needle, item, fuzzy_algo)
if score >= threshold and score > max_score:
max_score = score
best_dist = dist
best_idx = i
matched_text = str(item)
except Exception:
continue
if best_idx != -1:
index = best_idx + 1
props = {
"Index": {"type": "Double", "basicValue": float(index)},
"Score": {"type": "Double", "basicValue": float(max_score)},
"Match": {"type": "String", "basicValue": matched_text}
}
if best_dist is not None:
props["Distance"] = {"type": "Double", "basicValue": float(best_dist)}
res_row.append({
"type": "Double",
"basicValue": float(index),
"properties": props
})
else:
res_row.append({
"type": "String",
"basicValue": "no match",
"properties": {
"Index": {"type": "String", "basicValue": ""},
"Score": {"type": "String", "basicValue": ""},
"Match": {"type": "String", "basicValue": ""},
"Distance": {"type": "String", "basicValue": ""}
}
})
results.append(res_row)
return results
except Exception as e:
return f"Error: {str(e)}"