TSVD_REDUCE
Truncated singular value decomposition projects a matrix into a lower-dimensional linear subspace without centering the original features first. It is often used for sparse matrices, count matrices, and other high-dimensional tabular inputs where centering would be undesirable.
The decomposition approximates the data matrix X:
X \approx U \Sigma V^T
where U and V are matrices with orthogonal columns and \Sigma is a diagonal matrix containing the singular values.
This wrapper accepts rows as samples and columns as features. It returns the transformed scores together with learned component vectors, explained variance values, explained variance ratios, and singular values from the fitted decomposition.
Excel Usage
=TSVD_REDUCE(data, n_components, tsvd_algorithm, n_iter, random_state)
data(list[list], required): 2D array of numeric input data with rows as samples and columns as features.n_components(int, optional, default: 2): Number of singular vectors to retain.tsvd_algorithm(str, optional, default: “randomized”): Solver used to compute the truncated decomposition.n_iter(int, optional, default: 5): Number of power iterations used by the randomized solver.random_state(int, optional, default: null): Integer seed for reproducible randomized solver runs. Leave blank for the estimator default.
Returns (dict): Excel data type containing projected scores, component vectors, and explained variance summaries.
Example 1: Reduce a count-like matrix with randomized truncated SVD
Inputs:
| data | n_components | tsvd_algorithm | n_iter | random_state | |||
|---|---|---|---|---|---|---|---|
| 4 | 0 | 1 | 0 | 2 | randomized | 7 | 0 |
| 3 | 1 | 0 | 0 | ||||
| 0 | 2 | 4 | 1 | ||||
| 0 | 1 | 5 | 2 | ||||
| 5 | 0 | 1 | 0 | ||||
| 0 | 3 | 4 | 2 |
Excel formula:
=TSVD_REDUCE({4,0,1,0;3,1,0,0;0,2,4,1;0,1,5,2;5,0,1,0;0,3,4,2}, 2, "randomized", 7, 0)
Expected output:
{"type":"Double","basicValue":0.931194,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.931194},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":4},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":1.93405},{"type":"Double","basicValue":3.6262}],[{"type":"Double","basicValue":1.19906},{"type":"Double","basicValue":2.75861}],[{"type":"Double","basicValue":4.37825},{"type":"Double","basicValue":-1.22495}],[{"type":"Double","basicValue":5.14271},{"type":"Double","basicValue":-1.4662}],[{"type":"Double","basicValue":2.20955},{"type":"Double","basicValue":4.58516}],[{"type":"Double","basicValue":5.05578},{"type":"Double","basicValue":-1.49309}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.275501},{"type":"Double","basicValue":0.372556},{"type":"Double","basicValue":0.832042},{"type":"Double","basicValue":0.304972}],[{"type":"Double","basicValue":0.958959},{"type":"Double","basicValue":-0.118264},{"type":"Double","basicValue":-0.209636},{"type":"Double","basicValue":-0.149877}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":2.51791}],[{"type":"Double","basicValue":6.6647}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.255337}],[{"type":"Double","basicValue":0.675857}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":9.0132}],[{"type":"Double","basicValue":6.90381}]]}}}
Example 2: Use the ARPACK solver on a compact dense matrix
Inputs:
| data | n_components | tsvd_algorithm | n_iter | random_state | ||
|---|---|---|---|---|---|---|
| 1 | 2 | 0 | 2 | arpack | 5 | 0 |
| 2 | 3 | 1 | ||||
| 3 | 5 | 2 | ||||
| 4 | 6 | 3 | ||||
| 5 | 8 | 5 |
Excel formula:
=TSVD_REDUCE({1,2,0;2,3,1;3,5,2;4,6,3;5,8,5}, 2, "arpack", 5, 0)
Expected output:
{"type":"Double","basicValue":0.997991,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.997991},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":3},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":2.03633},{"type":"Double","basicValue":-0.912029}],[{"type":"Double","basicValue":3.70179},{"type":"Double","basicValue":-0.528017}],[{"type":"Double","basicValue":6.14111},{"type":"Double","basicValue":-0.528017}],[{"type":"Double","basicValue":7.80657},{"type":"Double","basicValue":-0.144005}],[{"type":"Double","basicValue":10.6489},{"type":"Double","basicValue":0.768025}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.488612},{"type":"Double","basicValue":0.773861},{"type":"Double","basicValue":0.402986}],[{"type":"Double","basicValue":-0.144005},{"type":"Double","basicValue":-0.384012},{"type":"Double","basicValue":0.912029}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":9.17313}],[{"type":"Double","basicValue":0.327742}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.963564}],[{"type":"Double","basicValue":0.0344267}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":15.1626}],[{"type":"Double","basicValue":1.41421}]]}}}
Example 3: Keep only one singular vector from a two-feature matrix
Inputs:
| data | n_components | tsvd_algorithm | n_iter | random_state | |
|---|---|---|---|---|---|
| 1 | 0 | 1 | randomized | 5 | 3 |
| 2 | 1 | ||||
| 3 | 1 | ||||
| 4 | 2 | ||||
| 5 | 3 |
Excel formula:
=TSVD_REDUCE({1,0;2,1;3,1;4,2;5,3}, 1, "randomized", 5, 3)
Expected output:
{"type":"Double","basicValue":0.96652,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.96652},"component_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":2},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":0.889168}],[{"type":"Double","basicValue":2.23592}],[{"type":"Double","basicValue":3.12508}],[{"type":"Double","basicValue":4.47183}],[{"type":"Double","basicValue":5.81858}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.889168},{"type":"Double","basicValue":0.457582}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":2.93822}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.96652}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":8.33122}]]}}}
Example 4: Retain three components from a wider dense matrix
Inputs:
| data | n_components | tsvd_algorithm | n_iter | random_state | ||||
|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 0 | 3 | 1 | 3 | randomized | 9 | 11 |
| 3 | 1 | 1 | 4 | 1 | ||||
| 4 | 2 | 1 | 5 | 2 | ||||
| 5 | 3 | 2 | 6 | 3 | ||||
| 6 | 5 | 3 | 8 | 5 | ||||
| 7 | 8 | 5 | 11 | 8 |
Excel formula:
=TSVD_REDUCE({2,1,0,3,1;3,1,1,4,1;4,2,1,5,2;5,3,2,6,3;6,5,3,8,5;7,8,5,11,8}, 3, "randomized", 9, 11)
Expected output:
{"type":"Double","basicValue":0.997767,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.997767},"component_count":{"type":"Double","basicValue":3},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":5},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":3.64627},{"type":"Double","basicValue":1.13513},{"type":"Double","basicValue":-0.644274}],[{"type":"Double","basicValue":4.99393},{"type":"Double","basicValue":1.68159},{"type":"Double","basicValue":0.159584}],[{"type":"Double","basicValue":6.88991},{"type":"Double","basicValue":1.58207},{"type":"Double","basicValue":-0.121217}],[{"type":"Double","basicValue":9.02932},{"type":"Double","basicValue":1.14663},{"type":"Double","basicValue":0.357838}],[{"type":"Double","basicValue":12.6065},{"type":"Double","basicValue":0.0815798},{"type":"Double","basicValue":0.104611}],[{"type":"Double","basicValue":17.8649},{"type":"Double","basicValue":-1.949},{"type":"Double","basicValue":-0.121041}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.458187},{"type":"Double","basicValue":0.395869},{"type":"Double","basicValue":0.243427},{"type":"Double","basicValue":0.646052},{"type":"Double","basicValue":0.395869}],[{"type":"Double","basicValue":0.530095},{"type":"Double","basicValue":-0.490949},{"type":"Double","basicValue":-0.335912},{"type":"Double","basicValue":0.352279},{"type":"Double","basicValue":-0.490949}],[{"type":"Double","basicValue":0.45148},{"type":"Double","basicValue":-0.162402},{"type":"Double","basicValue":0.759855},{"type":"Double","basicValue":-0.407477},{"type":"Double","basicValue":-0.162402}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":23.4302}],[{"type":"Double","basicValue":1.58077}],[{"type":"Double","basicValue":0.0995387}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.931}],[{"type":"Double","basicValue":0.0628122}],[{"type":"Double","basicValue":0.00395518}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":25.403}],[{"type":"Double","basicValue":3.42626}],[{"type":"Double","basicValue":0.780315}]]}}}
Python Code
import numpy as np
from sklearn.decomposition import TruncatedSVD as SklearnTruncatedSVD
def tsvd_reduce(data, n_components=2, tsvd_algorithm='randomized', n_iter=5, random_state=None):
"""
Fit truncated singular value decomposition and return projected scores with variance summaries.
See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
n_components (int, optional): Number of singular vectors to retain. Default is 2.
tsvd_algorithm (str, optional): Solver used to compute the truncated decomposition. Valid options: Randomized, Arpack. Default is 'randomized'.
n_iter (int, optional): Number of power iterations used by the randomized solver. Default is 5.
random_state (int, optional): Integer seed for reproducible randomized solver runs. Leave blank for the estimator default. Default is None.
Returns:
dict: Excel data type containing projected scores, component vectors, and explained variance summaries.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
if data_np.shape[0] < 2:
return None, "Error: data must contain at least 2 samples"
return data_np, None
def orient_projection(scores, components):
score_np = np.array(scores, dtype=float, copy=True)
component_np = np.array(components, dtype=float, copy=True)
limit = min(score_np.shape[1], component_np.shape[0])
for index in range(limit):
component_row = component_np[index, :]
pivot = int(np.argmax(np.abs(component_row)))
pivot_value = component_row[pivot]
if pivot_value == 0 and score_np.shape[0] > 0:
score_column = score_np[:, index]
pivot_value = score_column[int(np.argmax(np.abs(score_column)))]
if pivot_value < 0:
component_np[index, :] *= -1.0
score_np[:, index] *= -1.0
return score_np, component_np
try:
data_np, error = parse_data(data)
if error:
return error
component_total = int(n_components)
max_components = min(data_np.shape[0], data_np.shape[1])
if component_total < 1 or component_total > max_components:
return f"Error: n_components must be between 1 and {max_components}"
algorithm_value = str(tsvd_algorithm).strip().lower()
if algorithm_value not in {"randomized", "arpack"}:
return "Error: tsvd_algorithm must be 'randomized' or 'arpack'"
if algorithm_value == "arpack" and component_total >= max_components:
return "Error: n_components must be below min(n_samples, n_features) when tsvd_algorithm is 'arpack'"
if int(n_iter) < 1:
return "Error: n_iter must be at least 1"
fitted = SklearnTruncatedSVD(
n_components=component_total,
algorithm=algorithm_value,
n_iter=int(n_iter),
random_state=None if random_state in (None, "") else int(random_state)
)
scores_np = np.asarray(fitted.fit_transform(data_np), dtype=float)
components_np = np.asarray(fitted.components_, dtype=float)
scores_np, components_np = orient_projection(scores_np, components_np)
explained_variance = np.atleast_1d(np.asarray(fitted.explained_variance_, dtype=float))
variance_ratio = np.atleast_1d(np.asarray(fitted.explained_variance_ratio_, dtype=float))
singular_values = np.atleast_1d(np.asarray(fitted.singular_values_, dtype=float))
variance_ratio_sum = float(np.sum(variance_ratio))
return {
"type": "Double",
"basicValue": variance_ratio_sum,
"properties": {
"explained_variance_ratio_sum": {"type": "Double", "basicValue": variance_ratio_sum},
"component_count": {"type": "Double", "basicValue": float(components_np.shape[0])},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"scores": {"type": "Array", "elements": mat(scores_np.tolist())},
"components": {"type": "Array", "elements": mat(components_np.tolist())},
"explained_variance": {"type": "Array", "elements": col(explained_variance.tolist())},
"explained_variance_ratio": {"type": "Array", "elements": col(variance_ratio.tolist())},
"singular_values": {"type": "Array", "elements": col(singular_values.tolist())}
}
}
except Exception as e:
return f"Error: {str(e)}"