KNN_CLASSIFY

K-nearest neighbors classification predicts the label of a sample based on the labels of its k closest neighbors in the training set. The distance between samples x and y is typically measured using the Minkowski distance:

d(x, y) = \left( \sum_{i=1}^n |x_i - y_i|^p \right)^{1/p}

When p=2, this corresponds to the standard Euclidean distance, and when p=1, it is the Manhattan distance. The function can also return class probabilities derived from the proportion of neighbors supporting each class.

This wrapper accepts rows as samples and a target supplied as a single row or single column. It returns training accuracy together with predicted labels, class counts, fitted class probabilities, and the resolved distance metric.

Excel Usage

=KNN_CLASSIFY(data, target, n_neighbors, knn_weights, knn_metric, p)
  • data (list[list], required): 2D array of numeric feature data with rows as samples and columns as features.
  • target (list[list], required): Target labels as a single row, single column, or scalar when only one sample is present.
  • n_neighbors (int, optional, default: 5): Number of nearest neighbors used for each vote.
  • knn_weights (str, optional, default: “uniform”): Weighting scheme used when aggregating neighbor votes.
  • knn_metric (str, optional, default: “minkowski”): Distance metric used to compare samples.
  • p (int, optional, default: 2): Power parameter for the Minkowski metric.

Returns (dict): Excel data type containing training accuracy, predictions, probabilities, and k-nearest-neighbor summary properties.

Example 1: Classify two string-labeled groups with uniform neighbor votes

Inputs:

data target n_neighbors knn_weights knn_metric p
0 low 3 uniform euclidean 2
0.1 low
0.2 low
1.5 high
1.6 high
1.7 high

Excel formula:

=KNN_CLASSIFY({0;0.1;0.2;1.5;1.6;1.7}, {"low";"low";"low";"high";"high";"high"}, 3, "uniform", "euclidean", 2)

Expected output:

{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"high"}],[{"type":"String","basicValue":"low"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"low"}],[{"type":"String","basicValue":"low"}],[{"type":"String","basicValue":"low"}],[{"type":"String","basicValue":"high"}],[{"type":"String","basicValue":"high"}],[{"type":"String","basicValue":"high"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"high"},{"type":"Double","basicValue":3}],[{"type":"String","basicValue":"low"},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}]]},"neighbor_count":{"type":"Double","basicValue":3},"effective_metric":{"type":"String","basicValue":"euclidean"}}}

Example 2: Use distance weighting for numeric target labels

Inputs:

data target n_neighbors knn_weights knn_metric p
0 0 0 3 distance euclidean 2
0 0.2 0
0.2 0 0
2 2 1
2.1 2 1
2 2.1 1

Excel formula:

=KNN_CLASSIFY({0,0;0,0.2;0.2,0;2,2;2.1,2;2,2.1}, {0;0;0;1;1;1}, 3, "distance", "euclidean", 2)

Expected output:

{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}]]},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}]]},"neighbor_count":{"type":"Double","basicValue":3},"effective_metric":{"type":"String","basicValue":"euclidean"}}}

Example 3: Fit k-nearest neighbors for three groups with Manhattan distance

Inputs:

data target n_neighbors knn_weights knn_metric p
0 0 left 1 uniform manhattan 1
0.1 0.2 left
4 4 center
4.1 3.9 center
8 0 right
8.1 0.2 right

Excel formula:

=KNN_CLASSIFY({0,0;0.1,0.2;4,4;4.1,3.9;8,0;8.1,0.2}, {"left";"left";"center";"center";"right";"right"}, 1, "uniform", "manhattan", 1)

Expected output:

{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":3},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"right"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"right"}],[{"type":"String","basicValue":"right"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"center"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"left"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"right"},{"type":"Double","basicValue":2}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}]]},"neighbor_count":{"type":"Double","basicValue":1},"effective_metric":{"type":"String","basicValue":"manhattan"}}}

Example 4: Flatten a single-row boolean target range for k-nearest neighbors

Inputs:

data target n_neighbors knn_weights knn_metric p
0 false false false true true true 1 uniform euclidean 2
0.3
0.6
1.4
1.7
2

Excel formula:

=KNN_CLASSIFY({0;0.3;0.6;1.4;1.7;2}, {FALSE,FALSE,FALSE,TRUE,TRUE,TRUE}, 1, "uniform", "euclidean", 2)

Expected output:

{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"predictions":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Boolean","basicValue":false},{"type":"Double","basicValue":3}],[{"type":"Boolean","basicValue":true},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}]]},"neighbor_count":{"type":"Double","basicValue":1},"effective_metric":{"type":"String","basicValue":"euclidean"}}}

Python Code

import numpy as np
from sklearn.neighbors import KNeighborsClassifier as SklearnKNeighborsClassifier

def knn_classify(data, target, n_neighbors=5, knn_weights='uniform', knn_metric='minkowski', p=2):
    """
    Fit a k-nearest neighbors classifier and return training predictions.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
        target (list[list]): Target labels as a single row, single column, or scalar when only one sample is present.
        n_neighbors (int, optional): Number of nearest neighbors used for each vote. Default is 5.
        knn_weights (str, optional): Weighting scheme used when aggregating neighbor votes. Valid options: Uniform, Distance. Default is 'uniform'.
        knn_metric (str, optional): Distance metric used to compare samples. Valid options: Minkowski, Euclidean, Manhattan. Default is 'minkowski'.
        p (int, optional): Power parameter for the Minkowski metric. Default is 2.

    Returns:
        dict: Excel data type containing training accuracy, predictions, probabilities, and k-nearest-neighbor summary properties.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def col(values):
        return [[cell(value)] for value in values]

    def mat(values):
        return [[cell(value) for value in row] for row in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        return data_np, None

    def parse_target(value, sample_count):
        if not isinstance(value, list):
            labels = [value]
        elif not value:
            return None, "Error: target must be non-empty"
        elif all(not isinstance(item, list) for item in value):
            labels = value
        elif len(value) == 1:
            labels = value[0]
        elif all(isinstance(row, list) and len(row) == 1 for row in value):
            labels = [row[0] for row in value]
        else:
            return None, "Error: target must be a single row or column"

        if len(labels) != sample_count:
            return None, "Error: target length must match sample count"

        parsed = []
        classes = []
        for item in labels:
            item = py(item)
            if isinstance(item, str):
                if not item.strip():
                    return None, "Error: target labels must not be blank"
            elif isinstance(item, bool):
                item = bool(item)
            elif isinstance(item, (int, float)) and not isinstance(item, bool):
                if not np.isfinite(float(item)):
                    return None, "Error: target labels must be finite"
                item = float(item) if isinstance(item, float) else int(item)
            else:
                return None, "Error: target labels must be scalar string, boolean, or numeric values"
            parsed.append(item)
            if not any(type(existing) is type(item) and existing == item for existing in classes):
                classes.append(item)

        if len(classes) < 2:
            return None, "Error: target must contain at least 2 classes"
        return parsed, None

    def count_table(predictions, classes):
        rows = [[{"type": "String", "basicValue": "class"}, {"type": "String", "basicValue": "count"}]]
        for class_label in classes:
            count = sum(type(prediction) is type(class_label) and prediction == class_label for prediction in predictions)
            rows.append([cell(class_label), {"type": "Double", "basicValue": float(count)}])
        return rows

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        target_values, error = parse_target(target, data_np.shape[0])
        if error:
            return error

        neighbor_total = int(n_neighbors)
        if neighbor_total < 1:
            return "Error: n_neighbors must be at least 1"
        if neighbor_total > data_np.shape[0]:
            return "Error: n_neighbors cannot exceed the number of samples"

        weights_value = str(knn_weights).strip().lower()
        if weights_value not in {"uniform", "distance"}:
            return "Error: weights must be 'uniform' or 'distance'"

        metric_value = str(knn_metric).strip().lower()
        if metric_value not in {"minkowski", "euclidean", "manhattan"}:
            return "Error: metric must be 'minkowski', 'euclidean', or 'manhattan'"
        if int(p) < 1:
            return "Error: p must be at least 1"

        fitted = SklearnKNeighborsClassifier(
            n_neighbors=neighbor_total,
            weights=weights_value,
            metric=metric_value,
            p=int(p)
        ).fit(data_np, target_values)

        prediction_array = fitted.predict(data_np)
        predictions = [py(item) for item in prediction_array.tolist()]
        classes = [py(item) for item in fitted.classes_.tolist()]
        accuracy = float(np.mean([
            type(prediction) is type(actual) and prediction == actual
            for prediction, actual in zip(predictions, target_values)
        ]))

        return {
            "type": "Double",
            "basicValue": accuracy,
            "properties": {
                "accuracy": {"type": "Double", "basicValue": accuracy},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "class_count": {"type": "Double", "basicValue": float(len(classes))},
                "classes": {"type": "Array", "elements": col(classes)},
                "predictions": {"type": "Array", "elements": col(predictions)},
                "prediction_counts": {"type": "Array", "elements": count_table(predictions, classes)},
                "probabilities": {"type": "Array", "elements": mat(fitted.predict_proba(data_np).tolist())},
                "neighbor_count": {"type": "Double", "basicValue": float(neighbor_total)},
                "effective_metric": {"type": "String", "basicValue": str(fitted.effective_metric_)}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

2D array of numeric feature data with rows as samples and columns as features.
Target labels as a single row, single column, or scalar when only one sample is present.
Number of nearest neighbors used for each vote.
Weighting scheme used when aggregating neighbor votes.
Distance metric used to compare samples.
Power parameter for the Minkowski metric.