TSNE_EMBED

t-distributed stochastic neighbor embedding maps high-dimensional samples into a low-dimensional space by preserving local neighborhoods rather than global linear structure. It is most useful for visualization-oriented exploratory analysis.

The algorithm minimizes the Kullback-Leibler divergence between the joint probabilities of the high-dimensional and low-dimensional representations:

KL(P || Q) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}}

where P and Q represent the pairwise similarities in the original and embedded spaces respectively.

This wrapper accepts rows as samples and columns as features. It returns the fitted embedding together with the final KL divergence, the number of embedding dimensions, and the iteration count reported by the fitted estimator.

Excel Usage

=TSNE_EMBED(data, n_components, perplexity, learning_rate, max_iter, tsne_init, tsne_method, random_state)
  • data (list[list], required): 2D array of numeric input data with rows as samples and columns as features.
  • n_components (int, optional, default: 2): Dimension of the embedded output space.
  • perplexity (float, optional, default: 30): Effective neighborhood size used in the t-SNE objective.
  • learning_rate (float, optional, default: 200): Positive optimizer learning rate for the embedding updates.
  • max_iter (int, optional, default: 1000): Maximum number of optimization iterations.
  • tsne_init (str, optional, default: “pca”): Initialization scheme for the starting embedding.
  • tsne_method (str, optional, default: “barnes_hut”): Gradient computation method used during optimization.
  • random_state (int, optional, default: null): Integer seed for deterministic initialization. Leave blank for the estimator default.

Returns (dict): Excel data type containing the fitted embedding and final KL divergence.

Example 1: Embed two separated clusters with the exact solver

Inputs:

data n_components perplexity learning_rate max_iter tsne_init tsne_method random_state
0 0 2 2.5 80 500 pca exact 0
0 1
1 0
1 1
5 5
5 6
6 5
6 6

Excel formula:

=TSNE_EMBED({0,0;0,1;1,0;1,1;5,5;5,6;6,5;6,6}, 2, 2.5, 80, 500, "pca", "exact", 0)

Expected output:

{"type":"Double","basicValue":1.27152,"properties":{"kl_divergence":{"type":"Double","basicValue":1.27152},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":2},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":308.174},{"type":"Double","basicValue":-596.145}],[{"type":"Double","basicValue":565.379},{"type":"Double","basicValue":140.845}],[{"type":"Double","basicValue":710.635},{"type":"Double","basicValue":-1128.97}],[{"type":"Double","basicValue":-343.572},{"type":"Double","basicValue":1700.94}],[{"type":"Double","basicValue":1682.58},{"type":"Double","basicValue":-627.124}],[{"type":"Double","basicValue":-1589.38},{"type":"Double","basicValue":723.269}],[{"type":"Double","basicValue":-803.921},{"type":"Double","basicValue":71.7089}],[{"type":"Double","basicValue":-435.913},{"type":"Double","basicValue":644.484}]]},"n_iter":{"type":"Double","basicValue":499}}}

Example 2: Use seeded random initialization on three compact groups

Inputs:

data n_components perplexity learning_rate max_iter tsne_init tsne_method random_state
0 0 0 2 2 60 500 random exact 3
0.2 0.1 0
5 5 5
5.1 5.2 5
10 0 10
10.2 0.1 10.1

Excel formula:

=TSNE_EMBED({0,0,0;0.2,0.1,0;5,5,5;5.1,5.2,5;10,0,10;10.2,0.1,10.1}, 2, 2, 60, 500, "random", "exact", 3)

Expected output:

{"type":"Double","basicValue":1.36008,"properties":{"kl_divergence":{"type":"Double","basicValue":1.36008},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":914.209},{"type":"Double","basicValue":-274.882}],[{"type":"Double","basicValue":-890.978},{"type":"Double","basicValue":295.72}],[{"type":"Double","basicValue":-234.515},{"type":"Double","basicValue":453.258}],[{"type":"Double","basicValue":256.056},{"type":"Double","basicValue":-448.663}],[{"type":"Double","basicValue":156.849},{"type":"Double","basicValue":102.724}],[{"type":"Double","basicValue":-143.79},{"type":"Double","basicValue":-103.507}]]},"n_iter":{"type":"Double","basicValue":499}}}

Example 3: Embed one-dimensional samples into two dimensions

Inputs:

data n_components perplexity learning_rate max_iter tsne_init tsne_method random_state
0 2 2 50 500 random exact 1
1
2
3
4
5

Excel formula:

=TSNE_EMBED({0;1;2;3;4;5}, 2, 2, 50, 500, "random", "exact", 1)

Expected output:

{"type":"Double","basicValue":0.123383,"properties":{"kl_divergence":{"type":"Double","basicValue":0.123383},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":-145.516},{"type":"Double","basicValue":21.1913}],[{"type":"Double","basicValue":-93.6204},{"type":"Double","basicValue":23.9552}],[{"type":"Double","basicValue":-30.9413},{"type":"Double","basicValue":25.6826}],[{"type":"Double","basicValue":37.3494},{"type":"Double","basicValue":26.9504}],[{"type":"Double","basicValue":100.051},{"type":"Double","basicValue":27.51}],[{"type":"Double","basicValue":152.001},{"type":"Double","basicValue":26.1001}]]},"n_iter":{"type":"Double","basicValue":499}}}

Example 4: Produce a three-dimensional embedding with Barnes-Hut t-SNE

Inputs:

data n_components perplexity learning_rate max_iter tsne_init tsne_method random_state
0 0 1 3 2.5 75 500 pca barnes_hut 5
0.1 0.2 1.1
1 1 2
1.2 1.1 2.1
5 5 0
5.2 5.1 0.2
6 6 1
6.1 6.2 1.1

Excel formula:

=TSNE_EMBED({0,0,1;0.1,0.2,1.1;1,1,2;1.2,1.1,2.1;5,5,0;5.2,5.1,0.2;6,6,1;6.1,6.2,1.1}, 3, 2.5, 75, 500, "pca", "barnes_hut", 5)

Expected output:

{"type":"Double","basicValue":1.79411,"properties":{"kl_divergence":{"type":"Double","basicValue":1.79411},"component_count":{"type":"Double","basicValue":3},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":3},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":180.549},{"type":"Double","basicValue":-89.0916},{"type":"Double","basicValue":19.9081}],[{"type":"Double","basicValue":-64.7},{"type":"Double","basicValue":68.89},{"type":"Double","basicValue":-60.3671}],[{"type":"Double","basicValue":89.9451},{"type":"Double","basicValue":12.4007},{"type":"Double","basicValue":13.6712}],[{"type":"Double","basicValue":52.8532},{"type":"Double","basicValue":3.74339},{"type":"Double","basicValue":-82.2136}],[{"type":"Double","basicValue":-29.3893},{"type":"Double","basicValue":-133.062},{"type":"Double","basicValue":3.69885}],[{"type":"Double","basicValue":-54.7223},{"type":"Double","basicValue":-38.0214},{"type":"Double","basicValue":32.0658}],[{"type":"Double","basicValue":-95.3113},{"type":"Double","basicValue":-87.4605},{"type":"Double","basicValue":133.372}],[{"type":"Double","basicValue":193.254},{"type":"Double","basicValue":649.021},{"type":"Double","basicValue":40.2834}]]},"n_iter":{"type":"Double","basicValue":499}}}

Python Code

import numpy as np
from sklearn.manifold import TSNE as SklearnTSNE

def tsne_embed(data, n_components=2, perplexity=30, learning_rate=200, max_iter=1000, tsne_init='pca', tsne_method='barnes_hut', random_state=None):
    """
    Fit t-SNE and return a low-dimensional embedding with the final KL divergence.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
        n_components (int, optional): Dimension of the embedded output space. Default is 2.
        perplexity (float, optional): Effective neighborhood size used in the t-SNE objective. Default is 30.
        learning_rate (float, optional): Positive optimizer learning rate for the embedding updates. Default is 200.
        max_iter (int, optional): Maximum number of optimization iterations. Default is 1000.
        tsne_init (str, optional): Initialization scheme for the starting embedding. Valid options: PCA, Random. Default is 'pca'.
        tsne_method (str, optional): Gradient computation method used during optimization. Valid options: Barnes-Hut, Exact. Default is 'barnes_hut'.
        random_state (int, optional): Integer seed for deterministic initialization. Leave blank for the estimator default. Default is None.

    Returns:
        dict: Excel data type containing the fitted embedding and final KL divergence.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def mat(values):
        return [[cell(value) for value in row] for row in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        if data_np.shape[0] < 3:
            return None, "Error: data must contain at least 3 samples"
        return data_np, None

    def orient_embedding(embedding):
        embedding_np = np.array(embedding, dtype=float, copy=True)
        for index in range(embedding_np.shape[1]):
            column = embedding_np[:, index]
            pivot_value = column[int(np.argmax(np.abs(column)))] if column.size else 0.0
            if pivot_value < 0:
                embedding_np[:, index] *= -1.0
        return embedding_np

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        component_total = int(n_components)
        if component_total < 1:
            return "Error: n_components must be at least 1"

        perplexity_value = float(perplexity)
        if perplexity_value <= 0 or perplexity_value >= data_np.shape[0]:
            return f"Error: perplexity must be greater than 0 and less than {data_np.shape[0]}"

        learning_rate_value = float(learning_rate)
        if learning_rate_value <= 0:
            return "Error: learning_rate must be greater than 0"

        if int(max_iter) < 250:
            return "Error: max_iter must be at least 250"

        init_value = str(tsne_init).strip().lower()
        if init_value not in {"pca", "random"}:
            return "Error: tsne_init must be 'pca' or 'random'"

        method_value = str(tsne_method).strip().lower()
        if method_value not in {"barnes_hut", "exact"}:
            return "Error: tsne_method must be 'barnes_hut' or 'exact'"
        if method_value == "barnes_hut" and component_total > 3:
            return "Error: n_components must be 3 or less when tsne_method is 'barnes_hut'"

        fitted = SklearnTSNE(
            n_components=component_total,
            perplexity=perplexity_value,
            learning_rate=learning_rate_value,
            max_iter=int(max_iter),
            init=init_value,
            method=method_value,
            random_state=None if random_state in (None, "") else int(random_state)
        )

        embedding_np = orient_embedding(fitted.fit_transform(data_np))
        kl_divergence = float(fitted.kl_divergence_)
        n_iter_value = float(getattr(fitted, "n_iter_", int(max_iter)))

        return {
            "type": "Double",
            "basicValue": kl_divergence,
            "properties": {
                "kl_divergence": {"type": "Double", "basicValue": kl_divergence},
                "component_count": {"type": "Double", "basicValue": float(embedding_np.shape[1])},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "embedding": {"type": "Array", "elements": mat(np.asarray(embedding_np, dtype=float).tolist())},
                "n_iter": {"type": "Double", "basicValue": n_iter_value}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

2D array of numeric input data with rows as samples and columns as features.
Dimension of the embedded output space.
Effective neighborhood size used in the t-SNE objective.
Positive optimizer learning rate for the embedding updates.
Maximum number of optimization iterations.
Initialization scheme for the starting embedding.
Gradient computation method used during optimization.
Integer seed for deterministic initialization. Leave blank for the estimator default.