TSNE_EMBED
t-distributed stochastic neighbor embedding maps high-dimensional samples into a low-dimensional space by preserving local neighborhoods rather than global linear structure. It is most useful for visualization-oriented exploratory analysis.
The algorithm minimizes the Kullback-Leibler divergence between the joint probabilities of the high-dimensional and low-dimensional representations:
KL(P || Q) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}}
where P and Q represent the pairwise similarities in the original and embedded spaces respectively.
This wrapper accepts rows as samples and columns as features. It returns the fitted embedding together with the final KL divergence, the number of embedding dimensions, and the iteration count reported by the fitted estimator.
Excel Usage
=TSNE_EMBED(data, n_components, perplexity, learning_rate, max_iter, tsne_init, tsne_method, random_state)
data(list[list], required): 2D array of numeric input data with rows as samples and columns as features.n_components(int, optional, default: 2): Dimension of the embedded output space.perplexity(float, optional, default: 30): Effective neighborhood size used in the t-SNE objective.learning_rate(float, optional, default: 200): Positive optimizer learning rate for the embedding updates.max_iter(int, optional, default: 1000): Maximum number of optimization iterations.tsne_init(str, optional, default: “pca”): Initialization scheme for the starting embedding.tsne_method(str, optional, default: “barnes_hut”): Gradient computation method used during optimization.random_state(int, optional, default: null): Integer seed for deterministic initialization. Leave blank for the estimator default.
Returns (dict): Excel data type containing the fitted embedding and final KL divergence.
Example 1: Embed two separated clusters with the exact solver
Inputs:
| data | n_components | perplexity | learning_rate | max_iter | tsne_init | tsne_method | random_state | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 2 | 2.5 | 80 | 500 | pca | exact | 0 |
| 0 | 1 | |||||||
| 1 | 0 | |||||||
| 1 | 1 | |||||||
| 5 | 5 | |||||||
| 5 | 6 | |||||||
| 6 | 5 | |||||||
| 6 | 6 |
Excel formula:
=TSNE_EMBED({0,0;0,1;1,0;1,1;5,5;5,6;6,5;6,6}, 2, 2.5, 80, 500, "pca", "exact", 0)
Expected output:
{"type":"Double","basicValue":1.27152,"properties":{"kl_divergence":{"type":"Double","basicValue":1.27152},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":2},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":308.174},{"type":"Double","basicValue":-596.145}],[{"type":"Double","basicValue":565.379},{"type":"Double","basicValue":140.845}],[{"type":"Double","basicValue":710.635},{"type":"Double","basicValue":-1128.97}],[{"type":"Double","basicValue":-343.572},{"type":"Double","basicValue":1700.94}],[{"type":"Double","basicValue":1682.58},{"type":"Double","basicValue":-627.124}],[{"type":"Double","basicValue":-1589.38},{"type":"Double","basicValue":723.269}],[{"type":"Double","basicValue":-803.921},{"type":"Double","basicValue":71.7089}],[{"type":"Double","basicValue":-435.913},{"type":"Double","basicValue":644.484}]]},"n_iter":{"type":"Double","basicValue":499}}}
Example 2: Use seeded random initialization on three compact groups
Inputs:
| data | n_components | perplexity | learning_rate | max_iter | tsne_init | tsne_method | random_state | ||
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 2 | 2 | 60 | 500 | random | exact | 3 |
| 0.2 | 0.1 | 0 | |||||||
| 5 | 5 | 5 | |||||||
| 5.1 | 5.2 | 5 | |||||||
| 10 | 0 | 10 | |||||||
| 10.2 | 0.1 | 10.1 |
Excel formula:
=TSNE_EMBED({0,0,0;0.2,0.1,0;5,5,5;5.1,5.2,5;10,0,10;10.2,0.1,10.1}, 2, 2, 60, 500, "random", "exact", 3)
Expected output:
{"type":"Double","basicValue":1.36008,"properties":{"kl_divergence":{"type":"Double","basicValue":1.36008},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":914.209},{"type":"Double","basicValue":-274.882}],[{"type":"Double","basicValue":-890.978},{"type":"Double","basicValue":295.72}],[{"type":"Double","basicValue":-234.515},{"type":"Double","basicValue":453.258}],[{"type":"Double","basicValue":256.056},{"type":"Double","basicValue":-448.663}],[{"type":"Double","basicValue":156.849},{"type":"Double","basicValue":102.724}],[{"type":"Double","basicValue":-143.79},{"type":"Double","basicValue":-103.507}]]},"n_iter":{"type":"Double","basicValue":499}}}
Example 3: Embed one-dimensional samples into two dimensions
Inputs:
| data | n_components | perplexity | learning_rate | max_iter | tsne_init | tsne_method | random_state |
|---|---|---|---|---|---|---|---|
| 0 | 2 | 2 | 50 | 500 | random | exact | 1 |
| 1 | |||||||
| 2 | |||||||
| 3 | |||||||
| 4 | |||||||
| 5 |
Excel formula:
=TSNE_EMBED({0;1;2;3;4;5}, 2, 2, 50, 500, "random", "exact", 1)
Expected output:
{"type":"Double","basicValue":0.123383,"properties":{"kl_divergence":{"type":"Double","basicValue":0.123383},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":-145.516},{"type":"Double","basicValue":21.1913}],[{"type":"Double","basicValue":-93.6204},{"type":"Double","basicValue":23.9552}],[{"type":"Double","basicValue":-30.9413},{"type":"Double","basicValue":25.6826}],[{"type":"Double","basicValue":37.3494},{"type":"Double","basicValue":26.9504}],[{"type":"Double","basicValue":100.051},{"type":"Double","basicValue":27.51}],[{"type":"Double","basicValue":152.001},{"type":"Double","basicValue":26.1001}]]},"n_iter":{"type":"Double","basicValue":499}}}
Example 4: Produce a three-dimensional embedding with Barnes-Hut t-SNE
Inputs:
| data | n_components | perplexity | learning_rate | max_iter | tsne_init | tsne_method | random_state | ||
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | 3 | 2.5 | 75 | 500 | pca | barnes_hut | 5 |
| 0.1 | 0.2 | 1.1 | |||||||
| 1 | 1 | 2 | |||||||
| 1.2 | 1.1 | 2.1 | |||||||
| 5 | 5 | 0 | |||||||
| 5.2 | 5.1 | 0.2 | |||||||
| 6 | 6 | 1 | |||||||
| 6.1 | 6.2 | 1.1 |
Excel formula:
=TSNE_EMBED({0,0,1;0.1,0.2,1.1;1,1,2;1.2,1.1,2.1;5,5,0;5.2,5.1,0.2;6,6,1;6.1,6.2,1.1}, 3, 2.5, 75, 500, "pca", "barnes_hut", 5)
Expected output:
{"type":"Double","basicValue":1.79411,"properties":{"kl_divergence":{"type":"Double","basicValue":1.79411},"component_count":{"type":"Double","basicValue":3},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":3},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":180.549},{"type":"Double","basicValue":-89.0916},{"type":"Double","basicValue":19.9081}],[{"type":"Double","basicValue":-64.7},{"type":"Double","basicValue":68.89},{"type":"Double","basicValue":-60.3671}],[{"type":"Double","basicValue":89.9451},{"type":"Double","basicValue":12.4007},{"type":"Double","basicValue":13.6712}],[{"type":"Double","basicValue":52.8532},{"type":"Double","basicValue":3.74339},{"type":"Double","basicValue":-82.2136}],[{"type":"Double","basicValue":-29.3893},{"type":"Double","basicValue":-133.062},{"type":"Double","basicValue":3.69885}],[{"type":"Double","basicValue":-54.7223},{"type":"Double","basicValue":-38.0214},{"type":"Double","basicValue":32.0658}],[{"type":"Double","basicValue":-95.3113},{"type":"Double","basicValue":-87.4605},{"type":"Double","basicValue":133.372}],[{"type":"Double","basicValue":193.254},{"type":"Double","basicValue":649.021},{"type":"Double","basicValue":40.2834}]]},"n_iter":{"type":"Double","basicValue":499}}}
Python Code
import numpy as np
from sklearn.manifold import TSNE as SklearnTSNE
def tsne_embed(data, n_components=2, perplexity=30, learning_rate=200, max_iter=1000, tsne_init='pca', tsne_method='barnes_hut', random_state=None):
"""
Fit t-SNE and return a low-dimensional embedding with the final KL divergence.
See: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
n_components (int, optional): Dimension of the embedded output space. Default is 2.
perplexity (float, optional): Effective neighborhood size used in the t-SNE objective. Default is 30.
learning_rate (float, optional): Positive optimizer learning rate for the embedding updates. Default is 200.
max_iter (int, optional): Maximum number of optimization iterations. Default is 1000.
tsne_init (str, optional): Initialization scheme for the starting embedding. Valid options: PCA, Random. Default is 'pca'.
tsne_method (str, optional): Gradient computation method used during optimization. Valid options: Barnes-Hut, Exact. Default is 'barnes_hut'.
random_state (int, optional): Integer seed for deterministic initialization. Leave blank for the estimator default. Default is None.
Returns:
dict: Excel data type containing the fitted embedding and final KL divergence.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
if data_np.shape[0] < 3:
return None, "Error: data must contain at least 3 samples"
return data_np, None
def orient_embedding(embedding):
embedding_np = np.array(embedding, dtype=float, copy=True)
for index in range(embedding_np.shape[1]):
column = embedding_np[:, index]
pivot_value = column[int(np.argmax(np.abs(column)))] if column.size else 0.0
if pivot_value < 0:
embedding_np[:, index] *= -1.0
return embedding_np
try:
data_np, error = parse_data(data)
if error:
return error
component_total = int(n_components)
if component_total < 1:
return "Error: n_components must be at least 1"
perplexity_value = float(perplexity)
if perplexity_value <= 0 or perplexity_value >= data_np.shape[0]:
return f"Error: perplexity must be greater than 0 and less than {data_np.shape[0]}"
learning_rate_value = float(learning_rate)
if learning_rate_value <= 0:
return "Error: learning_rate must be greater than 0"
if int(max_iter) < 250:
return "Error: max_iter must be at least 250"
init_value = str(tsne_init).strip().lower()
if init_value not in {"pca", "random"}:
return "Error: tsne_init must be 'pca' or 'random'"
method_value = str(tsne_method).strip().lower()
if method_value not in {"barnes_hut", "exact"}:
return "Error: tsne_method must be 'barnes_hut' or 'exact'"
if method_value == "barnes_hut" and component_total > 3:
return "Error: n_components must be 3 or less when tsne_method is 'barnes_hut'"
fitted = SklearnTSNE(
n_components=component_total,
perplexity=perplexity_value,
learning_rate=learning_rate_value,
max_iter=int(max_iter),
init=init_value,
method=method_value,
random_state=None if random_state in (None, "") else int(random_state)
)
embedding_np = orient_embedding(fitted.fit_transform(data_np))
kl_divergence = float(fitted.kl_divergence_)
n_iter_value = float(getattr(fitted, "n_iter_", int(max_iter)))
return {
"type": "Double",
"basicValue": kl_divergence,
"properties": {
"kl_divergence": {"type": "Double", "basicValue": kl_divergence},
"component_count": {"type": "Double", "basicValue": float(embedding_np.shape[1])},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"embedding": {"type": "Array", "elements": mat(np.asarray(embedding_np, dtype=float).tolist())},
"n_iter": {"type": "Double", "basicValue": n_iter_value}
}
}
except Exception as e:
return f"Error: {str(e)}"