KMEANS

K-means clustering partitions a set of samples into a fixed number of clusters by iteratively assigning each sample to the nearest centroid and then recomputing the centroid positions.

The algorithm minimizes the within-cluster sum of squares (inertia), defined as:

J = \sum_{j=1}^{k} \sum_{x_i \in S_j} ||x_i - \mu_j||^2

where k is the number of clusters, S_j is the j-th cluster, and \mu_j is the centroid of S_j.

This wrapper accepts data with rows as samples and columns as features. It returns the fitted labels, a compact label count table, the learned cluster centers, and the final inertia, which is the sum of squared distances from each sample to its assigned centroid.

Excel Usage

=KMEANS(data, n_clusters, kmeans_init, max_iter, tol, random_state)
  • data (list[list], required): 2D array of input data with rows as samples and columns as features.
  • n_clusters (int, optional, default: 8): Number of clusters to fit.
  • kmeans_init (str, optional, default: “k-means++”): Initialization method for the starting centroids.
  • max_iter (int, optional, default: 300): Maximum number of k-means iterations.
  • tol (float, optional, default: 0.0001): Relative tolerance for declaring convergence.
  • random_state (int, optional, default: null): Integer seed for reproducible centroid initialization. Leave blank for non-deterministic runs.

Returns (dict): Excel data type containing inertia, labels, label counts, and cluster centers.

Example 1: Separate two compact point clouds with k-means++

Inputs:

data n_clusters kmeans_init max_iter tol random_state
0 0 2 k-means++ 300 0.0001 0
0 1
1 0
5 5
5 6
6 5

Excel formula:

=KMEANS({0,0;0,1;1,0;5,5;5,6;6,5}, 2, "k-means++", 300, 0.0001, 0)

Expected output:

{"type":"Double","basicValue":2.66667,"properties":{"inertia":{"type":"Double","basicValue":2.66667},"cluster_count":{"type":"Double","basicValue":2},"labels":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}]]},"label_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"label"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"cluster_centers":{"type":"Array","elements":[[{"type":"Double","basicValue":5.33333},{"type":"Double","basicValue":5.33333}],[{"type":"Double","basicValue":0.333333},{"type":"Double","basicValue":0.333333}]]},"n_iter":{"type":"Double","basicValue":2}}}

Example 2: Fit three compact groups with random initialization

Inputs:

data n_clusters kmeans_init max_iter tol random_state
1 1 3 random 200 0.0001 7
1.2 0.8
0.8 1.1
8 8
8.2 7.9
7.8 8.1
15 1
15.2 0.8
14.8 1.1

Excel formula:

=KMEANS({1,1;1.2,0.8;0.8,1.1;8,8;8.2,7.9;7.8,8.1;15,1;15.2,0.8;14.8,1.1}, 3, "random", 200, 0.0001, 7)

Expected output:

{"type":"Double","basicValue":0.353333,"properties":{"inertia":{"type":"Double","basicValue":0.353333},"cluster_count":{"type":"Double","basicValue":3},"labels":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"label_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"label"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":2},{"type":"Double","basicValue":3}]]},"cluster_centers":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0.966667}],[{"type":"Double","basicValue":15},{"type":"Double","basicValue":0.966667}],[{"type":"Double","basicValue":8},{"type":"Double","basicValue":8}]]},"n_iter":{"type":"Double","basicValue":2}}}

Example 3: Cluster a single Excel cell as one sample

Inputs:

data n_clusters kmeans_init max_iter tol random_state
5 1 k-means++ 100 0.0001 0

Excel formula:

=KMEANS(5, 1, "k-means++", 100, 0.0001, 0)

Expected output:

{"type":"Double","basicValue":0,"properties":{"inertia":{"type":"Double","basicValue":0},"cluster_count":{"type":"Double","basicValue":1},"labels":{"type":"Array","elements":[[{"type":"Double","basicValue":0}]]},"label_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"label"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}]]},"cluster_centers":{"type":"Array","elements":[[{"type":"Double","basicValue":5}]]},"n_iter":{"type":"Double","basicValue":1}}}

Example 4: Cluster one-dimensional samples into two groups

Inputs:

data n_clusters kmeans_init max_iter tol random_state
0 2 k-means++ 150 0.0001 4
0.2
0.4
4.8
5
5.2

Excel formula:

=KMEANS({0;0.2;0.4;4.8;5;5.2}, 2, "k-means++", 150, 0.0001, 4)

Expected output:

{"type":"Double","basicValue":0.16,"properties":{"inertia":{"type":"Double","basicValue":0.16},"cluster_count":{"type":"Double","basicValue":2},"labels":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}]]},"label_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"label"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"cluster_centers":{"type":"Array","elements":[[{"type":"Double","basicValue":5}],[{"type":"Double","basicValue":0.2}]]},"n_iter":{"type":"Double","basicValue":2}}}

Python Code

import numpy as np
from sklearn.cluster import KMeans as SklearnKMeans

def kmeans(data, n_clusters=8, kmeans_init='k-means++', max_iter=300, tol=0.0001, random_state=None):
    """
    Partition samples into centroid-based clusters with k-means.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of input data with rows as samples and columns as features.
        n_clusters (int, optional): Number of clusters to fit. Default is 8.
        kmeans_init (str, optional): Initialization method for the starting centroids. Valid options: K-means++, Random. Default is 'k-means++'.
        max_iter (int, optional): Maximum number of k-means iterations. Default is 300.
        tol (float, optional): Relative tolerance for declaring convergence. Default is 0.0001.
        random_state (int, optional): Integer seed for reproducible centroid initialization. Leave blank for non-deterministic runs. Default is None.

    Returns:
        dict: Excel data type containing inertia, labels, label counts, and cluster centers.
    """
    def to2d(value):
        return [[value]] if not isinstance(value, list) else value

    def parse_matrix(value):
        value = to2d(value)
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        matrix = np.array(value, dtype=float)
        if matrix.ndim != 2 or matrix.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(matrix).all():
            return None, "Error: data must contain only finite numeric values"
        return matrix, None

    def as_column(values):
        return [[{"type": "Double", "basicValue": float(item)}] for item in values]

    def as_matrix(values):
        return [[{"type": "Double", "basicValue": float(item)} for item in row] for row in values]

    def label_count_table(labels):
        unique_labels, counts = np.unique(labels, return_counts=True)
        rows = [[{"type": "String", "basicValue": "label"}, {"type": "String", "basicValue": "count"}]]
        rows.extend(
            [[{"type": "Double", "basicValue": float(label)}, {"type": "Double", "basicValue": float(count)}]
             for label, count in zip(unique_labels.tolist(), counts.tolist())]
        )
        return rows

    try:
        data_np, error = parse_matrix(data)
        if error:
            return error

        cluster_total = int(n_clusters)
        if cluster_total < 1:
            return "Error: n_clusters must be at least 1"
        if cluster_total > data_np.shape[0]:
            return "Error: n_clusters cannot exceed the number of samples"

        init_value = str(kmeans_init).strip()
        if init_value not in {"k-means++", "random"}:
            return "Error: init must be 'k-means++' or 'random'"

        if int(max_iter) < 1:
            return "Error: max_iter must be at least 1"
        if float(tol) < 0:
            return "Error: tol must be non-negative"

        seed = None if random_state in (None, "") else int(random_state)
        fitted = SklearnKMeans(
            n_clusters=cluster_total,
            init=init_value,
            max_iter=int(max_iter),
            tol=float(tol),
            random_state=seed,
            n_init="auto"
        ).fit(data_np)

        labels = fitted.labels_
        cluster_count = int(np.unique(labels).size)

        return {
            "type": "Double",
            "basicValue": float(fitted.inertia_),
            "properties": {
                "inertia": {"type": "Double", "basicValue": float(fitted.inertia_)},
                "cluster_count": {"type": "Double", "basicValue": float(cluster_count)},
                "labels": {"type": "Array", "elements": as_column(labels.tolist())},
                "label_counts": {"type": "Array", "elements": label_count_table(labels)},
                "cluster_centers": {"type": "Array", "elements": as_matrix(fitted.cluster_centers_.tolist())},
                "n_iter": {"type": "Double", "basicValue": float(fitted.n_iter_)}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

2D array of input data with rows as samples and columns as features.
Number of clusters to fit.
Initialization method for the starting centroids.
Maximum number of k-means iterations.
Relative tolerance for declaring convergence.
Integer seed for reproducible centroid initialization. Leave blank for non-deterministic runs.