KNN_CLASSIFY
K-nearest neighbors classification predicts the label of a sample based on the labels of its k closest neighbors in the training set. The distance between samples x and y is typically measured using the Minkowski distance:
d(x, y) = \left( \sum_{i=1}^n |x_i - y_i|^p \right)^{1/p}
When p=2, this corresponds to the standard Euclidean distance, and when p=1, it is the Manhattan distance. The function can also return class probabilities derived from the proportion of neighbors supporting each class.
This wrapper accepts rows as samples and a target supplied as a single row or single column. It returns training accuracy together with predicted labels, class counts, fitted class probabilities, and the resolved distance metric.
Excel Usage
=KNN_CLASSIFY(data, target, n_neighbors, knn_weights, knn_metric, p)
data(list[list], required): 2D array of numeric feature data with rows as samples and columns as features.target(list[list], required): Target labels as a single row, single column, or scalar when only one sample is present.n_neighbors(int, optional, default: 5): Number of nearest neighbors used for each vote.knn_weights(str, optional, default: “uniform”): Weighting scheme used when aggregating neighbor votes.knn_metric(str, optional, default: “minkowski”): Distance metric used to compare samples.p(int, optional, default: 2): Power parameter for the Minkowski metric.
Returns (dict): Excel data type containing training accuracy, predictions, probabilities, and k-nearest-neighbor summary properties.
Example 1: Classify two string-labeled groups with uniform neighbor votes
Inputs:
| data | target | n_neighbors | knn_weights | knn_metric | p |
|---|---|---|---|---|---|
| 0 | low | 3 | uniform | euclidean | 2 |
| 0.1 | low | ||||
| 0.2 | low | ||||
| 1.5 | high | ||||
| 1.6 | high | ||||
| 1.7 | high |
Excel formula:
=KNN_CLASSIFY({0;0.1;0.2;1.5;1.6;1.7}, {"low";"low";"low";"high";"high";"high"}, 3, "uniform", "euclidean", 2)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"high"}],[{"type":"String","basicValue":"low"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"low"}],[{"type":"String","basicValue":"low"}],[{"type":"String","basicValue":"low"}],[{"type":"String","basicValue":"high"}],[{"type":"String","basicValue":"high"}],[{"type":"String","basicValue":"high"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"high"},{"type":"Double","basicValue":3}],[{"type":"String","basicValue":"low"},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}]]},"neighbor_count":{"type":"Double","basicValue":3},"effective_metric":{"type":"String","basicValue":"euclidean"}}}
Example 2: Use distance weighting for numeric target labels
Inputs:
| data | target | n_neighbors | knn_weights | knn_metric | p | |
|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 3 | distance | euclidean | 2 |
| 0 | 0.2 | 0 | ||||
| 0.2 | 0 | 0 | ||||
| 2 | 2 | 1 | ||||
| 2.1 | 2 | 1 | ||||
| 2 | 2.1 | 1 |
Excel formula:
=KNN_CLASSIFY({0,0;0,0.2;0.2,0;2,2;2.1,2;2,2.1}, {0;0;0;1;1;1}, 3, "distance", "euclidean", 2)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}]]},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}]]},"neighbor_count":{"type":"Double","basicValue":3},"effective_metric":{"type":"String","basicValue":"euclidean"}}}
Example 3: Fit k-nearest neighbors for three groups with Manhattan distance
Inputs:
| data | target | n_neighbors | knn_weights | knn_metric | p | |
|---|---|---|---|---|---|---|
| 0 | 0 | left | 1 | uniform | manhattan | 1 |
| 0.1 | 0.2 | left | ||||
| 4 | 4 | center | ||||
| 4.1 | 3.9 | center | ||||
| 8 | 0 | right | ||||
| 8.1 | 0.2 | right |
Excel formula:
=KNN_CLASSIFY({0,0;0.1,0.2;4,4;4.1,3.9;8,0;8.1,0.2}, {"left";"left";"center";"center";"right";"right"}, 1, "uniform", "manhattan", 1)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":3},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"right"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"right"}],[{"type":"String","basicValue":"right"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"center"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"left"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"right"},{"type":"Double","basicValue":2}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}]]},"neighbor_count":{"type":"Double","basicValue":1},"effective_metric":{"type":"String","basicValue":"manhattan"}}}
Example 4: Flatten a single-row boolean target range for k-nearest neighbors
Inputs:
| data | target | n_neighbors | knn_weights | knn_metric | p | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | false | false | false | true | true | true | 1 | uniform | euclidean | 2 |
| 0.3 | ||||||||||
| 0.6 | ||||||||||
| 1.4 | ||||||||||
| 1.7 | ||||||||||
| 2 |
Excel formula:
=KNN_CLASSIFY({0;0.3;0.6;1.4;1.7;2}, {FALSE,FALSE,FALSE,TRUE,TRUE,TRUE}, 1, "uniform", "euclidean", 2)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"predictions":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Boolean","basicValue":false},{"type":"Double","basicValue":3}],[{"type":"Boolean","basicValue":true},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}]]},"neighbor_count":{"type":"Double","basicValue":1},"effective_metric":{"type":"String","basicValue":"euclidean"}}}
Python Code
import numpy as np
from sklearn.neighbors import KNeighborsClassifier as SklearnKNeighborsClassifier
def knn_classify(data, target, n_neighbors=5, knn_weights='uniform', knn_metric='minkowski', p=2):
"""
Fit a k-nearest neighbors classifier and return training predictions.
See: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
target (list[list]): Target labels as a single row, single column, or scalar when only one sample is present.
n_neighbors (int, optional): Number of nearest neighbors used for each vote. Default is 5.
knn_weights (str, optional): Weighting scheme used when aggregating neighbor votes. Valid options: Uniform, Distance. Default is 'uniform'.
knn_metric (str, optional): Distance metric used to compare samples. Valid options: Minkowski, Euclidean, Manhattan. Default is 'minkowski'.
p (int, optional): Power parameter for the Minkowski metric. Default is 2.
Returns:
dict: Excel data type containing training accuracy, predictions, probabilities, and k-nearest-neighbor summary properties.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
return data_np, None
def parse_target(value, sample_count):
if not isinstance(value, list):
labels = [value]
elif not value:
return None, "Error: target must be non-empty"
elif all(not isinstance(item, list) for item in value):
labels = value
elif len(value) == 1:
labels = value[0]
elif all(isinstance(row, list) and len(row) == 1 for row in value):
labels = [row[0] for row in value]
else:
return None, "Error: target must be a single row or column"
if len(labels) != sample_count:
return None, "Error: target length must match sample count"
parsed = []
classes = []
for item in labels:
item = py(item)
if isinstance(item, str):
if not item.strip():
return None, "Error: target labels must not be blank"
elif isinstance(item, bool):
item = bool(item)
elif isinstance(item, (int, float)) and not isinstance(item, bool):
if not np.isfinite(float(item)):
return None, "Error: target labels must be finite"
item = float(item) if isinstance(item, float) else int(item)
else:
return None, "Error: target labels must be scalar string, boolean, or numeric values"
parsed.append(item)
if not any(type(existing) is type(item) and existing == item for existing in classes):
classes.append(item)
if len(classes) < 2:
return None, "Error: target must contain at least 2 classes"
return parsed, None
def count_table(predictions, classes):
rows = [[{"type": "String", "basicValue": "class"}, {"type": "String", "basicValue": "count"}]]
for class_label in classes:
count = sum(type(prediction) is type(class_label) and prediction == class_label for prediction in predictions)
rows.append([cell(class_label), {"type": "Double", "basicValue": float(count)}])
return rows
try:
data_np, error = parse_data(data)
if error:
return error
target_values, error = parse_target(target, data_np.shape[0])
if error:
return error
neighbor_total = int(n_neighbors)
if neighbor_total < 1:
return "Error: n_neighbors must be at least 1"
if neighbor_total > data_np.shape[0]:
return "Error: n_neighbors cannot exceed the number of samples"
weights_value = str(knn_weights).strip().lower()
if weights_value not in {"uniform", "distance"}:
return "Error: weights must be 'uniform' or 'distance'"
metric_value = str(knn_metric).strip().lower()
if metric_value not in {"minkowski", "euclidean", "manhattan"}:
return "Error: metric must be 'minkowski', 'euclidean', or 'manhattan'"
if int(p) < 1:
return "Error: p must be at least 1"
fitted = SklearnKNeighborsClassifier(
n_neighbors=neighbor_total,
weights=weights_value,
metric=metric_value,
p=int(p)
).fit(data_np, target_values)
prediction_array = fitted.predict(data_np)
predictions = [py(item) for item in prediction_array.tolist()]
classes = [py(item) for item in fitted.classes_.tolist()]
accuracy = float(np.mean([
type(prediction) is type(actual) and prediction == actual
for prediction, actual in zip(predictions, target_values)
]))
return {
"type": "Double",
"basicValue": accuracy,
"properties": {
"accuracy": {"type": "Double", "basicValue": accuracy},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"class_count": {"type": "Double", "basicValue": float(len(classes))},
"classes": {"type": "Array", "elements": col(classes)},
"predictions": {"type": "Array", "elements": col(predictions)},
"prediction_counts": {"type": "Array", "elements": count_table(predictions, classes)},
"probabilities": {"type": "Array", "elements": mat(fitted.predict_proba(data_np).tolist())},
"neighbor_count": {"type": "Double", "basicValue": float(neighbor_total)},
"effective_metric": {"type": "String", "basicValue": str(fitted.effective_metric_)}
}
}
except Exception as e:
return f"Error: {str(e)}"