TSVD_REDUCE

Truncated singular value decomposition projects a matrix into a lower-dimensional linear subspace without centering the original features first. It is often used for sparse matrices, count matrices, and other high-dimensional tabular inputs where centering would be undesirable.

The decomposition approximates the data matrix X:

X \approx U \Sigma V^T

where U and V are matrices with orthogonal columns and \Sigma is a diagonal matrix containing the singular values.

This wrapper accepts rows as samples and columns as features. It returns the transformed scores together with learned component vectors, explained variance values, explained variance ratios, and singular values from the fitted decomposition.

Excel Usage

=TSVD_REDUCE(data, n_components, tsvd_algorithm, n_iter, random_state)
  • data (list[list], required): 2D array of numeric input data with rows as samples and columns as features.
  • n_components (int, optional, default: 2): Number of singular vectors to retain.
  • tsvd_algorithm (str, optional, default: “randomized”): Solver used to compute the truncated decomposition.
  • n_iter (int, optional, default: 5): Number of power iterations used by the randomized solver.
  • random_state (int, optional, default: null): Integer seed for reproducible randomized solver runs. Leave blank for the estimator default.

Returns (dict): Excel data type containing projected scores, component vectors, and explained variance summaries.

Example 1: Reduce a count-like matrix with randomized truncated SVD

Inputs:

data n_components tsvd_algorithm n_iter random_state
4 0 1 0 2 randomized 7 0
3 1 0 0
0 2 4 1
0 1 5 2
5 0 1 0
0 3 4 2

Excel formula:

=TSVD_REDUCE({4,0,1,0;3,1,0,0;0,2,4,1;0,1,5,2;5,0,1,0;0,3,4,2}, 2, "randomized", 7, 0)

Expected output:

{"type":"Double","basicValue":0.931194,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.931194},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":4},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":1.93405},{"type":"Double","basicValue":3.6262}],[{"type":"Double","basicValue":1.19906},{"type":"Double","basicValue":2.75861}],[{"type":"Double","basicValue":4.37825},{"type":"Double","basicValue":-1.22495}],[{"type":"Double","basicValue":5.14271},{"type":"Double","basicValue":-1.4662}],[{"type":"Double","basicValue":2.20955},{"type":"Double","basicValue":4.58516}],[{"type":"Double","basicValue":5.05578},{"type":"Double","basicValue":-1.49309}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.275501},{"type":"Double","basicValue":0.372556},{"type":"Double","basicValue":0.832042},{"type":"Double","basicValue":0.304972}],[{"type":"Double","basicValue":0.958959},{"type":"Double","basicValue":-0.118264},{"type":"Double","basicValue":-0.209636},{"type":"Double","basicValue":-0.149877}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":2.51791}],[{"type":"Double","basicValue":6.6647}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.255337}],[{"type":"Double","basicValue":0.675857}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":9.0132}],[{"type":"Double","basicValue":6.90381}]]}}}

Example 2: Use the ARPACK solver on a compact dense matrix

Inputs:

data n_components tsvd_algorithm n_iter random_state
1 2 0 2 arpack 5 0
2 3 1
3 5 2
4 6 3
5 8 5

Excel formula:

=TSVD_REDUCE({1,2,0;2,3,1;3,5,2;4,6,3;5,8,5}, 2, "arpack", 5, 0)

Expected output:

{"type":"Double","basicValue":0.997991,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.997991},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":3},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":2.03633},{"type":"Double","basicValue":-0.912029}],[{"type":"Double","basicValue":3.70179},{"type":"Double","basicValue":-0.528017}],[{"type":"Double","basicValue":6.14111},{"type":"Double","basicValue":-0.528017}],[{"type":"Double","basicValue":7.80657},{"type":"Double","basicValue":-0.144005}],[{"type":"Double","basicValue":10.6489},{"type":"Double","basicValue":0.768025}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.488612},{"type":"Double","basicValue":0.773861},{"type":"Double","basicValue":0.402986}],[{"type":"Double","basicValue":-0.144005},{"type":"Double","basicValue":-0.384012},{"type":"Double","basicValue":0.912029}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":9.17313}],[{"type":"Double","basicValue":0.327742}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.963564}],[{"type":"Double","basicValue":0.0344267}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":15.1626}],[{"type":"Double","basicValue":1.41421}]]}}}

Example 3: Keep only one singular vector from a two-feature matrix

Inputs:

data n_components tsvd_algorithm n_iter random_state
1 0 1 randomized 5 3
2 1
3 1
4 2
5 3

Excel formula:

=TSVD_REDUCE({1,0;2,1;3,1;4,2;5,3}, 1, "randomized", 5, 3)

Expected output:

{"type":"Double","basicValue":0.96652,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.96652},"component_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":2},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":0.889168}],[{"type":"Double","basicValue":2.23592}],[{"type":"Double","basicValue":3.12508}],[{"type":"Double","basicValue":4.47183}],[{"type":"Double","basicValue":5.81858}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.889168},{"type":"Double","basicValue":0.457582}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":2.93822}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.96652}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":8.33122}]]}}}

Example 4: Retain three components from a wider dense matrix

Inputs:

data n_components tsvd_algorithm n_iter random_state
2 1 0 3 1 3 randomized 9 11
3 1 1 4 1
4 2 1 5 2
5 3 2 6 3
6 5 3 8 5
7 8 5 11 8

Excel formula:

=TSVD_REDUCE({2,1,0,3,1;3,1,1,4,1;4,2,1,5,2;5,3,2,6,3;6,5,3,8,5;7,8,5,11,8}, 3, "randomized", 9, 11)

Expected output:

{"type":"Double","basicValue":0.997767,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.997767},"component_count":{"type":"Double","basicValue":3},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":5},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":3.64627},{"type":"Double","basicValue":1.13513},{"type":"Double","basicValue":-0.644274}],[{"type":"Double","basicValue":4.99393},{"type":"Double","basicValue":1.68159},{"type":"Double","basicValue":0.159584}],[{"type":"Double","basicValue":6.88991},{"type":"Double","basicValue":1.58207},{"type":"Double","basicValue":-0.121217}],[{"type":"Double","basicValue":9.02932},{"type":"Double","basicValue":1.14663},{"type":"Double","basicValue":0.357838}],[{"type":"Double","basicValue":12.6065},{"type":"Double","basicValue":0.0815798},{"type":"Double","basicValue":0.104611}],[{"type":"Double","basicValue":17.8649},{"type":"Double","basicValue":-1.949},{"type":"Double","basicValue":-0.121041}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.458187},{"type":"Double","basicValue":0.395869},{"type":"Double","basicValue":0.243427},{"type":"Double","basicValue":0.646052},{"type":"Double","basicValue":0.395869}],[{"type":"Double","basicValue":0.530095},{"type":"Double","basicValue":-0.490949},{"type":"Double","basicValue":-0.335912},{"type":"Double","basicValue":0.352279},{"type":"Double","basicValue":-0.490949}],[{"type":"Double","basicValue":0.45148},{"type":"Double","basicValue":-0.162402},{"type":"Double","basicValue":0.759855},{"type":"Double","basicValue":-0.407477},{"type":"Double","basicValue":-0.162402}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":23.4302}],[{"type":"Double","basicValue":1.58077}],[{"type":"Double","basicValue":0.0995387}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.931}],[{"type":"Double","basicValue":0.0628122}],[{"type":"Double","basicValue":0.00395518}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":25.403}],[{"type":"Double","basicValue":3.42626}],[{"type":"Double","basicValue":0.780315}]]}}}

Python Code

import numpy as np
from sklearn.decomposition import TruncatedSVD as SklearnTruncatedSVD

def tsvd_reduce(data, n_components=2, tsvd_algorithm='randomized', n_iter=5, random_state=None):
    """
    Fit truncated singular value decomposition and return projected scores with variance summaries.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
        n_components (int, optional): Number of singular vectors to retain. Default is 2.
        tsvd_algorithm (str, optional): Solver used to compute the truncated decomposition. Valid options: Randomized, Arpack. Default is 'randomized'.
        n_iter (int, optional): Number of power iterations used by the randomized solver. Default is 5.
        random_state (int, optional): Integer seed for reproducible randomized solver runs. Leave blank for the estimator default. Default is None.

    Returns:
        dict: Excel data type containing projected scores, component vectors, and explained variance summaries.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def col(values):
        return [[cell(value)] for value in values]

    def mat(values):
        return [[cell(value) for value in row] for row in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        if data_np.shape[0] < 2:
            return None, "Error: data must contain at least 2 samples"
        return data_np, None

    def orient_projection(scores, components):
        score_np = np.array(scores, dtype=float, copy=True)
        component_np = np.array(components, dtype=float, copy=True)
        limit = min(score_np.shape[1], component_np.shape[0])
        for index in range(limit):
            component_row = component_np[index, :]
            pivot = int(np.argmax(np.abs(component_row)))
            pivot_value = component_row[pivot]
            if pivot_value == 0 and score_np.shape[0] > 0:
                score_column = score_np[:, index]
                pivot_value = score_column[int(np.argmax(np.abs(score_column)))]
            if pivot_value < 0:
                component_np[index, :] *= -1.0
                score_np[:, index] *= -1.0
        return score_np, component_np

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        component_total = int(n_components)
        max_components = min(data_np.shape[0], data_np.shape[1])
        if component_total < 1 or component_total > max_components:
            return f"Error: n_components must be between 1 and {max_components}"

        algorithm_value = str(tsvd_algorithm).strip().lower()
        if algorithm_value not in {"randomized", "arpack"}:
            return "Error: tsvd_algorithm must be 'randomized' or 'arpack'"
        if algorithm_value == "arpack" and component_total >= max_components:
            return "Error: n_components must be below min(n_samples, n_features) when tsvd_algorithm is 'arpack'"
        if int(n_iter) < 1:
            return "Error: n_iter must be at least 1"

        fitted = SklearnTruncatedSVD(
            n_components=component_total,
            algorithm=algorithm_value,
            n_iter=int(n_iter),
            random_state=None if random_state in (None, "") else int(random_state)
        )

        scores_np = np.asarray(fitted.fit_transform(data_np), dtype=float)
        components_np = np.asarray(fitted.components_, dtype=float)
        scores_np, components_np = orient_projection(scores_np, components_np)
        explained_variance = np.atleast_1d(np.asarray(fitted.explained_variance_, dtype=float))
        variance_ratio = np.atleast_1d(np.asarray(fitted.explained_variance_ratio_, dtype=float))
        singular_values = np.atleast_1d(np.asarray(fitted.singular_values_, dtype=float))
        variance_ratio_sum = float(np.sum(variance_ratio))

        return {
            "type": "Double",
            "basicValue": variance_ratio_sum,
            "properties": {
                "explained_variance_ratio_sum": {"type": "Double", "basicValue": variance_ratio_sum},
                "component_count": {"type": "Double", "basicValue": float(components_np.shape[0])},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "scores": {"type": "Array", "elements": mat(scores_np.tolist())},
                "components": {"type": "Array", "elements": mat(components_np.tolist())},
                "explained_variance": {"type": "Array", "elements": col(explained_variance.tolist())},
                "explained_variance_ratio": {"type": "Array", "elements": col(variance_ratio.tolist())},
                "singular_values": {"type": "Array", "elements": col(singular_values.tolist())}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

2D array of numeric input data with rows as samples and columns as features.
Number of singular vectors to retain.
Solver used to compute the truncated decomposition.
Number of power iterations used by the randomized solver.
Integer seed for reproducible randomized solver runs. Leave blank for the estimator default.