SPECTRAL_CLUSTER

Spectral clustering builds an affinity graph from the input samples, computes a low-dimensional embedding from the graph Laplacian, and then partitions that embedding into clusters.

The algorithm computes the unnormalized graph Laplacian L from the affinity matrix A and degree matrix D (D_{ii} = \sum_j A_{ij}):

L = D - A

It then uses the eigenvectors corresponding to the smallest eigenvalues of the Laplacian (or a normalized version like L_{sym} = I - D^{-1/2} A D^{-1/2}) to define a lower-dimensional subspace where k-means or another label assignment strategy is applied.

This wrapper accepts data with rows as samples and columns as features. It returns the fitted labels, a compact label count table, and the discovered cluster count while intentionally omitting the full affinity matrix to keep results compact.

Excel Usage

=SPECTRAL_CLUSTER(data, n_clusters, spec_affinity, gamma, n_neighbors, spec_assign, random_state)
  • data (list[list], required): 2D array of input data with rows as samples and columns as features.
  • n_clusters (int, optional, default: 8): Number of clusters to extract from the spectral embedding.
  • spec_affinity (str, optional, default: “rbf”): Strategy for constructing the affinity graph.
  • gamma (float, optional, default: 1): Kernel coefficient used for the RBF affinity.
  • n_neighbors (int, optional, default: 10): Number of neighbors used when affinity is nearest_neighbors.
  • spec_assign (str, optional, default: “kmeans”): Method used to convert the embedding into discrete labels.
  • random_state (int, optional, default: null): Integer seed for reproducible spectral initialization. Leave blank for non-deterministic runs.

Returns (dict): Excel data type containing cluster counts, labels, label counts, and the key spectral settings used.

Example 1: Split two separated point clouds with the RBF affinity

Inputs:

data n_clusters spec_affinity spec_assign random_state
0 0 2 rbf kmeans 0
0 1
1 0
5 5
5 6
6 5

Excel formula:

=SPECTRAL_CLUSTER({0,0;0,1;1,0;5,5;5,6;6,5}, 2, "rbf", "kmeans", 0)

Expected output:

{"type":"Double","basicValue":2,"properties":{"cluster_count":{"type":"Double","basicValue":2},"labels":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"label_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"label"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"affinity":{"type":"String","basicValue":"rbf"},"assign_labels":{"type":"String","basicValue":"kmeans"}}}

Example 2: Use nearest-neighbor affinity on two compact groups

Inputs:

data n_clusters spec_affinity n_neighbors spec_assign random_state
0 0 2 nearest_neighbors 5 kmeans 0
0 1
1 0
5 5
5 6
6 5

Excel formula:

=SPECTRAL_CLUSTER({0,0;0,1;1,0;5,5;5,6;6,5}, 2, "nearest_neighbors", 5, "kmeans", 0)

Expected output:

{"type":"Double","basicValue":2,"properties":{"cluster_count":{"type":"Double","basicValue":2},"labels":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"label_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"label"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"affinity":{"type":"String","basicValue":"nearest_neighbors"},"assign_labels":{"type":"String","basicValue":"kmeans"}}}

Example 3: Use discretization to assign labels in the spectral embedding

Inputs:

data n_clusters spec_affinity spec_assign random_state
1 1 2 rbf discretize 1
1.2 0.8
0.8 1.1
8 8
8.2 7.9
7.8 8.1

Excel formula:

=SPECTRAL_CLUSTER({1,1;1.2,0.8;0.8,1.1;8,8;8.2,7.9;7.8,8.1}, 2, "rbf", "discretize", 1)

Expected output:

{"type":"Double","basicValue":2,"properties":{"cluster_count":{"type":"Double","basicValue":2},"labels":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}]]},"label_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"label"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"affinity":{"type":"String","basicValue":"rbf"},"assign_labels":{"type":"String","basicValue":"discretize"}}}

Example 4: Use cluster QR label extraction on two separated groups

Inputs:

data n_clusters spec_affinity spec_assign random_state
0 2 rbf cluster_qr 2
0.2
0.4
4.8
5
5.2

Excel formula:

=SPECTRAL_CLUSTER({0;0.2;0.4;4.8;5;5.2}, 2, "rbf", "cluster_qr", 2)

Expected output:

{"type":"Double","basicValue":2,"properties":{"cluster_count":{"type":"Double","basicValue":2},"labels":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"label_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"label"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"affinity":{"type":"String","basicValue":"rbf"},"assign_labels":{"type":"String","basicValue":"cluster_qr"}}}

Python Code

import numpy as np
from sklearn.cluster import SpectralClustering as SklearnSpectralClustering

def spectral_cluster(data, n_clusters=8, spec_affinity='rbf', gamma=1, n_neighbors=10, spec_assign='kmeans', random_state=None):
    """
    Cluster samples by partitioning a graph-based spectral embedding.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.SpectralClustering.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of input data with rows as samples and columns as features.
        n_clusters (int, optional): Number of clusters to extract from the spectral embedding. Default is 8.
        spec_affinity (str, optional): Strategy for constructing the affinity graph. Valid options: RBF, Nearest Neighbors. Default is 'rbf'.
        gamma (float, optional): Kernel coefficient used for the RBF affinity. Default is 1.
        n_neighbors (int, optional): Number of neighbors used when affinity is nearest_neighbors. Default is 10.
        spec_assign (str, optional): Method used to convert the embedding into discrete labels. Valid options: K-means, Discretize, Cluster QR. Default is 'kmeans'.
        random_state (int, optional): Integer seed for reproducible spectral initialization. Leave blank for non-deterministic runs. Default is None.

    Returns:
        dict: Excel data type containing cluster counts, labels, label counts, and the key spectral settings used.
    """
    def to2d(value):
        return [[value]] if not isinstance(value, list) else value

    def parse_matrix(value):
        value = to2d(value)
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        matrix = np.array(value, dtype=float)
        if matrix.ndim != 2 or matrix.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(matrix).all():
            return None, "Error: data must contain only finite numeric values"
        return matrix, None

    def as_column(values):
        return [[{"type": "Double", "basicValue": float(item)}] for item in values]

    def label_count_table(labels):
        unique_labels, counts = np.unique(labels, return_counts=True)
        rows = [[{"type": "String", "basicValue": "label"}, {"type": "String", "basicValue": "count"}]]
        rows.extend(
            [[{"type": "Double", "basicValue": float(label)}, {"type": "Double", "basicValue": float(count)}]
             for label, count in zip(unique_labels.tolist(), counts.tolist())]
        )
        return rows

    try:
        data_np, error = parse_matrix(data)
        if error:
            return error

        cluster_total = int(n_clusters)
        if cluster_total < 1:
            return "Error: n_clusters must be at least 1"
        if cluster_total > data_np.shape[0]:
            return "Error: n_clusters cannot exceed the number of samples"

        affinity_value = str(spec_affinity).strip()
        if affinity_value not in {"rbf", "nearest_neighbors"}:
            return "Error: affinity must be 'rbf' or 'nearest_neighbors'"

        label_mode = str(spec_assign).strip()
        if label_mode not in {"kmeans", "discretize", "cluster_qr"}:
            return "Error: assign_labels must be 'kmeans', 'discretize', or 'cluster_qr'"

        if float(gamma) <= 0:
            return "Error: gamma must be greater than 0"
        if int(n_neighbors) < 1:
            return "Error: n_neighbors must be at least 1"
        if affinity_value == "nearest_neighbors" and int(n_neighbors) >= data_np.shape[0]:
            return "Error: n_neighbors must be smaller than the number of samples when affinity is nearest_neighbors"

        seed = None if random_state in (None, "") else int(random_state)
        fitted = SklearnSpectralClustering(
            n_clusters=cluster_total,
            affinity=affinity_value,
            gamma=float(gamma),
            n_neighbors=int(n_neighbors),
            assign_labels=label_mode,
            random_state=seed,
            n_init=10
        ).fit(data_np)

        labels = fitted.labels_
        cluster_count = int(np.unique(labels).size)

        return {
            "type": "Double",
            "basicValue": float(cluster_count),
            "properties": {
                "cluster_count": {"type": "Double", "basicValue": float(cluster_count)},
                "labels": {"type": "Array", "elements": as_column(labels.tolist())},
                "label_counts": {"type": "Array", "elements": label_count_table(labels)},
                "affinity": {"type": "String", "basicValue": affinity_value},
                "assign_labels": {"type": "String", "basicValue": label_mode}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

2D array of input data with rows as samples and columns as features.
Number of clusters to extract from the spectral embedding.
Strategy for constructing the affinity graph.
Kernel coefficient used for the RBF affinity.
Number of neighbors used when affinity is nearest_neighbors.
Method used to convert the embedding into discrete labels.
Integer seed for reproducible spectral initialization. Leave blank for non-deterministic runs.