KMEANS
K-means clustering partitions a set of samples into a fixed number of clusters by iteratively assigning each sample to the nearest centroid and then recomputing the centroid positions.
The algorithm minimizes the within-cluster sum of squares (inertia), defined as:
J = \sum_{j=1}^{k} \sum_{x_i \in S_j} ||x_i - \mu_j||^2
where k is the number of clusters, S_j is the j-th cluster, and \mu_j is the centroid of S_j.
This wrapper accepts data with rows as samples and columns as features. It returns the fitted labels, a compact label count table, the learned cluster centers, and the final inertia, which is the sum of squared distances from each sample to its assigned centroid.
Excel Usage
=KMEANS(data, n_clusters, kmeans_init, max_iter, tol, random_state)
data(list[list], required): 2D array of input data with rows as samples and columns as features.n_clusters(int, optional, default: 8): Number of clusters to fit.kmeans_init(str, optional, default: “k-means++”): Initialization method for the starting centroids.max_iter(int, optional, default: 300): Maximum number of k-means iterations.tol(float, optional, default: 0.0001): Relative tolerance for declaring convergence.random_state(int, optional, default: null): Integer seed for reproducible centroid initialization. Leave blank for non-deterministic runs.
Returns (dict): Excel data type containing inertia, labels, label counts, and cluster centers.
Example 1: Separate two compact point clouds with k-means++
Inputs:
| data | n_clusters | kmeans_init | max_iter | tol | random_state | |
|---|---|---|---|---|---|---|
| 0 | 0 | 2 | k-means++ | 300 | 0.0001 | 0 |
| 0 | 1 | |||||
| 1 | 0 | |||||
| 5 | 5 | |||||
| 5 | 6 | |||||
| 6 | 5 |
Excel formula:
=KMEANS({0,0;0,1;1,0;5,5;5,6;6,5}, 2, "k-means++", 300, 0.0001, 0)
Expected output:
{"type":"Double","basicValue":2.66667,"properties":{"inertia":{"type":"Double","basicValue":2.66667},"cluster_count":{"type":"Double","basicValue":2},"labels":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}]]},"label_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"label"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"cluster_centers":{"type":"Array","elements":[[{"type":"Double","basicValue":5.33333},{"type":"Double","basicValue":5.33333}],[{"type":"Double","basicValue":0.333333},{"type":"Double","basicValue":0.333333}]]},"n_iter":{"type":"Double","basicValue":2}}}
Example 2: Fit three compact groups with random initialization
Inputs:
| data | n_clusters | kmeans_init | max_iter | tol | random_state | |
|---|---|---|---|---|---|---|
| 1 | 1 | 3 | random | 200 | 0.0001 | 7 |
| 1.2 | 0.8 | |||||
| 0.8 | 1.1 | |||||
| 8 | 8 | |||||
| 8.2 | 7.9 | |||||
| 7.8 | 8.1 | |||||
| 15 | 1 | |||||
| 15.2 | 0.8 | |||||
| 14.8 | 1.1 |
Excel formula:
=KMEANS({1,1;1.2,0.8;0.8,1.1;8,8;8.2,7.9;7.8,8.1;15,1;15.2,0.8;14.8,1.1}, 3, "random", 200, 0.0001, 7)
Expected output:
{"type":"Double","basicValue":0.353333,"properties":{"inertia":{"type":"Double","basicValue":0.353333},"cluster_count":{"type":"Double","basicValue":3},"labels":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"label_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"label"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":2},{"type":"Double","basicValue":3}]]},"cluster_centers":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0.966667}],[{"type":"Double","basicValue":15},{"type":"Double","basicValue":0.966667}],[{"type":"Double","basicValue":8},{"type":"Double","basicValue":8}]]},"n_iter":{"type":"Double","basicValue":2}}}
Example 3: Cluster a single Excel cell as one sample
Inputs:
| data | n_clusters | kmeans_init | max_iter | tol | random_state |
|---|---|---|---|---|---|
| 5 | 1 | k-means++ | 100 | 0.0001 | 0 |
Excel formula:
=KMEANS(5, 1, "k-means++", 100, 0.0001, 0)
Expected output:
{"type":"Double","basicValue":0,"properties":{"inertia":{"type":"Double","basicValue":0},"cluster_count":{"type":"Double","basicValue":1},"labels":{"type":"Array","elements":[[{"type":"Double","basicValue":0}]]},"label_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"label"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}]]},"cluster_centers":{"type":"Array","elements":[[{"type":"Double","basicValue":5}]]},"n_iter":{"type":"Double","basicValue":1}}}
Example 4: Cluster one-dimensional samples into two groups
Inputs:
| data | n_clusters | kmeans_init | max_iter | tol | random_state |
|---|---|---|---|---|---|
| 0 | 2 | k-means++ | 150 | 0.0001 | 4 |
| 0.2 | |||||
| 0.4 | |||||
| 4.8 | |||||
| 5 | |||||
| 5.2 |
Excel formula:
=KMEANS({0;0.2;0.4;4.8;5;5.2}, 2, "k-means++", 150, 0.0001, 4)
Expected output:
{"type":"Double","basicValue":0.16,"properties":{"inertia":{"type":"Double","basicValue":0.16},"cluster_count":{"type":"Double","basicValue":2},"labels":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}]]},"label_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"label"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"cluster_centers":{"type":"Array","elements":[[{"type":"Double","basicValue":5}],[{"type":"Double","basicValue":0.2}]]},"n_iter":{"type":"Double","basicValue":2}}}
Python Code
import numpy as np
from sklearn.cluster import KMeans as SklearnKMeans
def kmeans(data, n_clusters=8, kmeans_init='k-means++', max_iter=300, tol=0.0001, random_state=None):
"""
Partition samples into centroid-based clusters with k-means.
See: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of input data with rows as samples and columns as features.
n_clusters (int, optional): Number of clusters to fit. Default is 8.
kmeans_init (str, optional): Initialization method for the starting centroids. Valid options: K-means++, Random. Default is 'k-means++'.
max_iter (int, optional): Maximum number of k-means iterations. Default is 300.
tol (float, optional): Relative tolerance for declaring convergence. Default is 0.0001.
random_state (int, optional): Integer seed for reproducible centroid initialization. Leave blank for non-deterministic runs. Default is None.
Returns:
dict: Excel data type containing inertia, labels, label counts, and cluster centers.
"""
def to2d(value):
return [[value]] if not isinstance(value, list) else value
def parse_matrix(value):
value = to2d(value)
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
matrix = np.array(value, dtype=float)
if matrix.ndim != 2 or matrix.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(matrix).all():
return None, "Error: data must contain only finite numeric values"
return matrix, None
def as_column(values):
return [[{"type": "Double", "basicValue": float(item)}] for item in values]
def as_matrix(values):
return [[{"type": "Double", "basicValue": float(item)} for item in row] for row in values]
def label_count_table(labels):
unique_labels, counts = np.unique(labels, return_counts=True)
rows = [[{"type": "String", "basicValue": "label"}, {"type": "String", "basicValue": "count"}]]
rows.extend(
[[{"type": "Double", "basicValue": float(label)}, {"type": "Double", "basicValue": float(count)}]
for label, count in zip(unique_labels.tolist(), counts.tolist())]
)
return rows
try:
data_np, error = parse_matrix(data)
if error:
return error
cluster_total = int(n_clusters)
if cluster_total < 1:
return "Error: n_clusters must be at least 1"
if cluster_total > data_np.shape[0]:
return "Error: n_clusters cannot exceed the number of samples"
init_value = str(kmeans_init).strip()
if init_value not in {"k-means++", "random"}:
return "Error: init must be 'k-means++' or 'random'"
if int(max_iter) < 1:
return "Error: max_iter must be at least 1"
if float(tol) < 0:
return "Error: tol must be non-negative"
seed = None if random_state in (None, "") else int(random_state)
fitted = SklearnKMeans(
n_clusters=cluster_total,
init=init_value,
max_iter=int(max_iter),
tol=float(tol),
random_state=seed,
n_init="auto"
).fit(data_np)
labels = fitted.labels_
cluster_count = int(np.unique(labels).size)
return {
"type": "Double",
"basicValue": float(fitted.inertia_),
"properties": {
"inertia": {"type": "Double", "basicValue": float(fitted.inertia_)},
"cluster_count": {"type": "Double", "basicValue": float(cluster_count)},
"labels": {"type": "Array", "elements": as_column(labels.tolist())},
"label_counts": {"type": "Array", "elements": label_count_table(labels)},
"cluster_centers": {"type": "Array", "elements": as_matrix(fitted.cluster_centers_.tolist())},
"n_iter": {"type": "Double", "basicValue": float(fitted.n_iter_)}
}
}
except Exception as e:
return f"Error: {str(e)}"