SPECTRAL_CLUSTER

Spectral clustering builds an affinity graph from the input samples, computes a low-dimensional embedding from the graph Laplacian, and then partitions that embedding into clusters.

The algorithm computes the unnormalized graph Laplacian L from the affinity matrix A and degree matrix D (D_{ii} = \sum_j A_{ij}):

L = D - A

It then uses the eigenvectors corresponding to the smallest eigenvalues of the Laplacian (or a normalized version like L_{sym} = I - D^{-1/2} A D^{-1/2}) to define a lower-dimensional subspace where k-means or another label assignment strategy is applied.

This wrapper accepts data with rows as samples and columns as features. It returns the fitted labels, a compact label count table, and the discovered cluster count while intentionally omitting the full affinity matrix to keep results compact.

Excel Usage

=SPECTRAL_CLUSTER(data, n_clusters, spec_affinity, gamma, n_neighbors, spec_assign, random_state)

data (list[list], required): 2D array of input data with rows as samples and columns as features.
n_clusters (int, optional, default: 8): Number of clusters to extract from the spectral embedding.
spec_affinity (str, optional, default: “rbf”): Strategy for constructing the affinity graph.
gamma (float, optional, default: 1): Kernel coefficient used for the RBF affinity.
n_neighbors (int, optional, default: 10): Number of neighbors used when affinity is nearest_neighbors.
spec_assign (str, optional, default: “kmeans”): Method used to convert the embedding into discrete labels.
random_state (int, optional, default: null): Integer seed for reproducible spectral initialization. Leave blank for non-deterministic runs.

Returns (dict): Excel data type containing cluster counts, labels, label counts, and the key spectral settings used.

Example 1: Split two separated point clouds with the RBF affinity

Inputs:

data		n_clusters	spec_affinity	spec_assign	random_state
0	0	2	rbf	kmeans	0
0	1
1	0
5	5
5	6
6	5

Excel formula:

=SPECTRAL_CLUSTER({0,0;0,1;1,0;5,5;5,6;6,5}, 2, "rbf", "kmeans", 0)

Expected output:

{"type":"Double","basicValue":2,"properties":{"cluster_count":{"type":"Double","basicValue":2},"labels":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"label_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"label"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"affinity":{"type":"String","basicValue":"rbf"},"assign_labels":{"type":"String","basicValue":"kmeans"}}}

Example 2: Use nearest-neighbor affinity on two compact groups

Inputs:

data		n_clusters	spec_affinity	n_neighbors	spec_assign	random_state
0	0	2	nearest_neighbors	5	kmeans	0
0	1
1	0
5	5
5	6
6	5

Excel formula:

=SPECTRAL_CLUSTER({0,0;0,1;1,0;5,5;5,6;6,5}, 2, "nearest_neighbors", 5, "kmeans", 0)

Expected output:

{"type":"Double","basicValue":2,"properties":{"cluster_count":{"type":"Double","basicValue":2},"labels":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"label_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"label"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"affinity":{"type":"String","basicValue":"nearest_neighbors"},"assign_labels":{"type":"String","basicValue":"kmeans"}}}

Example 3: Use discretization to assign labels in the spectral embedding

Inputs:

data		n_clusters	spec_affinity	spec_assign	random_state
1	1	2	rbf	discretize	1
1.2	0.8
0.8	1.1
8	8
8.2	7.9
7.8	8.1

Excel formula:

=SPECTRAL_CLUSTER({1,1;1.2,0.8;0.8,1.1;8,8;8.2,7.9;7.8,8.1}, 2, "rbf", "discretize", 1)

Expected output:

{"type":"Double","basicValue":2,"properties":{"cluster_count":{"type":"Double","basicValue":2},"labels":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}]]},"label_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"label"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"affinity":{"type":"String","basicValue":"rbf"},"assign_labels":{"type":"String","basicValue":"discretize"}}}

Example 4: Use cluster QR label extraction on two separated groups

Inputs:

data	n_clusters	spec_affinity	spec_assign	random_state
0	2	rbf	cluster_qr	2
0.2
0.4
4.8
5
5.2

Excel formula:

=SPECTRAL_CLUSTER({0;0.2;0.4;4.8;5;5.2}, 2, "rbf", "cluster_qr", 2)

Expected output:

Python Code

external_packages = ['scikit-learn']

import numpy as np
from sklearn.cluster import SpectralClustering as SklearnSpectralClustering

def spectral_cluster(data, n_clusters=8, spec_affinity='rbf', gamma=1, n_neighbors=10, spec_assign='kmeans', random_state=None):
    """
    Cluster samples by partitioning a graph-based spectral embedding.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.SpectralClustering.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of input data with rows as samples and columns as features.
        n_clusters (int, optional): Number of clusters to extract from the spectral embedding. Default is 8.
        spec_affinity (str, optional): Strategy for constructing the affinity graph. Valid options: RBF, Nearest Neighbors. Default is 'rbf'.
        gamma (float, optional): Kernel coefficient used for the RBF affinity. Default is 1.
        n_neighbors (int, optional): Number of neighbors used when affinity is nearest_neighbors. Default is 10.
        spec_assign (str, optional): Method used to convert the embedding into discrete labels. Valid options: K-means, Discretize, Cluster QR. Default is 'kmeans'.
        random_state (int, optional): Integer seed for reproducible spectral initialization. Leave blank for non-deterministic runs. Default is None.

    Returns:
        dict: Excel data type containing cluster counts, labels, label counts, and the key spectral settings used.
    """
    def to2d(value):
        return [[value]] if not isinstance(value, list) else value

    def parse_matrix(value):
        value = to2d(value)
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        matrix = np.array(value, dtype=float)
        if matrix.ndim != 2 or matrix.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(matrix).all():
            return None, "Error: data must contain only finite numeric values"
        return matrix, None

    def as_column(values):
        return [[{"type": "Double", "basicValue": float(item)}] for item in values]

    def label_count_table(labels):
        unique_labels, counts = np.unique(labels, return_counts=True)
        rows = [[{"type": "String", "basicValue": "label"}, {"type": "String", "basicValue": "count"}]]
        rows.extend(
            [[{"type": "Double", "basicValue": float(label)}, {"type": "Double", "basicValue": float(count)}]
             for label, count in zip(unique_labels.tolist(), counts.tolist())]
        )
        return rows

    try:
        data_np, error = parse_matrix(data)
        if error:
            return error

        cluster_total = int(n_clusters)
        if cluster_total < 1:
            return "Error: n_clusters must be at least 1"
        if cluster_total > data_np.shape[0]:
            return "Error: n_clusters cannot exceed the number of samples"

        affinity_value = str(spec_affinity).strip()
        if affinity_value not in {"rbf", "nearest_neighbors"}:
            return "Error: affinity must be 'rbf' or 'nearest_neighbors'"

        label_mode = str(spec_assign).strip()
        if label_mode not in {"kmeans", "discretize", "cluster_qr"}:
            return "Error: assign_labels must be 'kmeans', 'discretize', or 'cluster_qr'"

        if float(gamma) <= 0:
            return "Error: gamma must be greater than 0"
        if int(n_neighbors) < 1:
            return "Error: n_neighbors must be at least 1"
        if affinity_value == "nearest_neighbors" and int(n_neighbors) >= data_np.shape[0]:
            return "Error: n_neighbors must be smaller than the number of samples when affinity is nearest_neighbors"

        seed = None if random_state in (None, "") else int(random_state)
        fitted = SklearnSpectralClustering(
            n_clusters=cluster_total,
            affinity=affinity_value,
            gamma=float(gamma),
            n_neighbors=int(n_neighbors),
            assign_labels=label_mode,
            random_state=seed,
            n_init=10
        ).fit(data_np)

        labels = fitted.labels_
        cluster_count = int(np.unique(labels).size)

        return {
            "type": "Double",
            "basicValue": float(cluster_count),
            "properties": {
                "cluster_count": {"type": "Double", "basicValue": float(cluster_count)},
                "labels": {"type": "Array", "elements": as_column(labels.tolist())},
                "label_counts": {"type": "Array", "elements": label_count_table(labels)},
                "affinity": {"type": "String", "basicValue": affinity_value},
                "assign_labels": {"type": "String", "basicValue": label_mode}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

data *

2D array of input data with rows as samples and columns as features.

n_clusters

Number of clusters to extract from the spectral embedding.

spec_affinity

Strategy for constructing the affinity graph.

gamma

Kernel coefficient used for the RBF affinity.

n_neighbors

Number of neighbors used when affinity is nearest_neighbors.

spec_assign

Method used to convert the embedding into discrete labels.

random_state

Integer seed for reproducible spectral initialization. Leave blank for non-deterministic runs.