PCA_REDUCE

Principal component analysis projects tabular data onto orthogonal directions that capture the largest possible variance in descending order. It is commonly used to compress correlated features into a smaller set of informative components.

For a centered data matrix X, the projection is given by:

X = T P^T

where T are the principal component scores and P are the orthogonal principal axes (loadings).

This wrapper accepts rows as samples and columns as features. It returns the transformed component scores together with component loadings, explained variance values, explained variance ratios, and singular values for the fitted projection.

Excel Usage

=PCA_REDUCE(data, n_components, whiten, pca_solver, random_state)

data (list[list], required): 2D array of numeric input data with rows as samples and columns as features.
n_components (int, optional, default: null): Number of principal components to keep. Leave blank to keep the estimator default.
whiten (bool, optional, default: false): Whether to rescale components to unit variance after projection.
pca_solver (str, optional, default: “auto”): Singular value decomposition strategy used to fit the PCA model.
random_state (int, optional, default: null): Integer seed used when the randomized solver path is active. Leave blank for the estimator default.

Returns (dict): Excel data type containing projected scores, component loadings, and explained variance summaries.

Example 1: Reduce a dense three-feature matrix with the full solver

Inputs:

data			n_components	whiten	pca_solver	random_state
2	1	0	2	false	full	0
3	1	1
4	2	1
5	3	2
6	4	3
7	5	4

Excel formula:

=PCA_REDUCE({2,1,0;3,1,1;4,2,1;5,3,2;6,4,3;7,5,4}, 2, FALSE, "full", 0)

Expected output:

{"type":"Double","basicValue":0.994454,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.994454},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-3.49841},{"type":"Double","basicValue":0.277463}],[{"type":"Double","basicValue":-2.34023},{"type":"Double","basicValue":-0.495054}],[{"type":"Double","basicValue":-1.12509},{"type":"Double","basicValue":0.142399}],[{"type":"Double","basicValue":0.598075},{"type":"Double","basicValue":0.0837313}],[{"type":"Double","basicValue":2.32124},{"type":"Double","basicValue":0.0250639}],[{"type":"Double","basicValue":4.04441},{"type":"Double","basicValue":-0.0336034}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.650144},{"type":"Double","basicValue":0.564993},{"type":"Double","basicValue":0.508031}],[{"type":"Double","basicValue":-0.0763973},{"type":"Double","basicValue":0.713849},{"type":"Double","basicValue":-0.69612}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.986028}],[{"type":"Double","basicValue":0.00842663}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":8.2169}],[{"type":"Double","basicValue":0.0702219}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":6.40972}],[{"type":"Double","basicValue":0.592545}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":4.5}],[{"type":"Double","basicValue":2.66667}],[{"type":"Double","basicValue":1.83333}]]}}}

Example 2: Whiten a centered matrix into two principal components

Inputs:

data			n_components	whiten	pca_solver	random_state
1	2	3	2	true	full	0
2	4	5
3	6	7
4	8	9
5	10	11
6	12	13

Excel formula:

=PCA_REDUCE({1,2,3;2,4,5;3,6,7;4,8,9;5,10,11;6,12,13}, 2, TRUE, "full", 0)

Expected output:

{"type":"Double","basicValue":1,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":1},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.33631},{"type":"Double","basicValue":-1.78183}],[{"type":"Double","basicValue":-0.801784},{"type":"Double","basicValue":0.503593}],[{"type":"Double","basicValue":-0.267261},{"type":"Double","basicValue":0.0925721}],[{"type":"Double","basicValue":0.267261},{"type":"Double","basicValue":-0.0925721}],[{"type":"Double","basicValue":0.801784},{"type":"Double","basicValue":-0.503593}],[{"type":"Double","basicValue":1.33631},{"type":"Double","basicValue":-1.14049}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.333333},{"type":"Double","basicValue":0.666667},{"type":"Double","basicValue":0.666667}],[{"type":"Double","basicValue":0.942809},{"type":"Double","basicValue":-0.235702},{"type":"Double","basicValue":-0.235702}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1.70433e-33}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":31.5}],[{"type":"Double","basicValue":5.36864e-32}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":12.5499}],[{"type":"Double","basicValue":5.18104e-16}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":3.5}],[{"type":"Double","basicValue":7}],[{"type":"Double","basicValue":8}]]}}}

Example 3: Reduce one-feature data to a single component

Inputs:

data	n_components	whiten	pca_solver	random_state
1	1	false	full	0
2
4
7
11

Excel formula:

=PCA_REDUCE({1;2;4;7;11}, 1, FALSE, "full", 0)

Expected output:

Example 4: Use a seeded randomized PCA fit on a wider matrix

Inputs:

data				n_components	whiten	pca_solver	random_state
1	0	2	1	2	false	randomized	7
2	1	3	1
3	1	4	2
4	2	5	3
5	3	6	5
6	5	7	8

Excel formula:

=PCA_REDUCE({1,0,2,1;2,1,3,1;3,1,4,2;4,2,5,3;5,3,6,5;6,5,7,8}, 2, FALSE, "randomized", 7)

Expected output:

{"type":"Double","basicValue":0.996287,"properties":{"explained_variance_ratio_sum":{"type":"Double","basicValue":0.996287},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":4},"scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-4.59092},{"type":"Double","basicValue":0.905985}],[{"type":"Double","basicValue":-3.27712},{"type":"Double","basicValue":-0.00260405}],[{"type":"Double","basicValue":-1.7394},{"type":"Double","basicValue":-0.460947}],[{"type":"Double","basicValue":0.225926},{"type":"Double","basicValue":-0.747519}],[{"type":"Double","basicValue":2.84277},{"type":"Double","basicValue":-0.412074}],[{"type":"Double","basicValue":6.53874},{"type":"Double","basicValue":0.717159}]]},"components":{"type":"Array","elements":[[{"type":"Double","basicValue":0.443101},{"type":"Double","basicValue":0.427595},{"type":"Double","basicValue":0.443101},{"type":"Double","basicValue":0.651526}],[{"type":"Double","basicValue":-0.54018},{"type":"Double","basicValue":0.171771},{"type":"Double","basicValue":-0.54018},{"type":"Double","basicValue":0.622017}]]},"explained_variance_ratio":{"type":"Array","elements":[[{"type":"Double","basicValue":0.970519}],[{"type":"Double","basicValue":0.0257683}]]},"explained_variance":{"type":"Array","elements":[[{"type":"Double","basicValue":17.1458}],[{"type":"Double","basicValue":0.455239}]]},"singular_values":{"type":"Array","elements":[[{"type":"Double","basicValue":9.259}],[{"type":"Double","basicValue":1.50871}]]},"feature_means":{"type":"Array","elements":[[{"type":"Double","basicValue":3.5}],[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":4.5}],[{"type":"Double","basicValue":3.33333}]]}}}

Python Code

import numpy as np
from sklearn.decomposition import PCA as SklearnPCA

def pca_reduce(data, n_components=None, whiten=False, pca_solver='auto', random_state=None):
    """
    Fit principal component analysis and return projected scores with variance summaries.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
        n_components (int, optional): Number of principal components to keep. Leave blank to keep the estimator default. Default is None.
        whiten (bool, optional): Whether to rescale components to unit variance after projection. Default is False.
        pca_solver (str, optional): Singular value decomposition strategy used to fit the PCA model. Valid options: Auto, Full, Randomized, Arpack. Default is 'auto'.
        random_state (int, optional): Integer seed used when the randomized solver path is active. Leave blank for the estimator default. Default is None.

    Returns:
        dict: Excel data type containing projected scores, component loadings, and explained variance summaries.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def col(values):
        return [[cell(value)] for value in values]

    def mat(values):
        return [[cell(value) for value in row] for row in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        if data_np.shape[0] < 2:
            return None, "Error: data must contain at least 2 samples"
        return data_np, None

    def orient_projection(scores, components):
        score_np = np.array(scores, dtype=float, copy=True)
        component_np = np.array(components, dtype=float, copy=True)
        limit = min(score_np.shape[1], component_np.shape[0])
        for index in range(limit):
            component_row = component_np[index, :]
            pivot = int(np.argmax(np.abs(component_row)))
            pivot_value = component_row[pivot]
            if pivot_value == 0 and score_np.shape[0] > 0:
                score_column = score_np[:, index]
                pivot_value = score_column[int(np.argmax(np.abs(score_column)))]
            if pivot_value < 0:
                component_np[index, :] *= -1.0
                score_np[:, index] *= -1.0
        return score_np, component_np

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        component_total = None if n_components in (None, "") else int(n_components)
        max_components = min(data_np.shape[0], data_np.shape[1])
        if component_total is not None and (component_total < 1 or component_total > max_components):
            return f"Error: n_components must be between 1 and {max_components}"

        solver_value = str(pca_solver).strip().lower()
        if solver_value not in {"auto", "full", "randomized", "arpack"}:
            return "Error: pca_solver must be 'auto', 'full', 'randomized', or 'arpack'"
        if solver_value == "arpack" and (component_total is None or component_total >= max_components):
            return "Error: n_components must be set below min(n_samples, n_features) when pca_solver is 'arpack'"

        fitted = SklearnPCA(
            n_components=component_total,
            whiten=bool(whiten),
            svd_solver=solver_value,
            random_state=None if random_state in (None, "") else int(random_state)
        )

        scores_np = np.asarray(fitted.fit_transform(data_np), dtype=float)
        components_np = np.asarray(fitted.components_, dtype=float)
        scores_np, components_np = orient_projection(scores_np, components_np)
        variance_ratio = np.atleast_1d(np.asarray(fitted.explained_variance_ratio_, dtype=float))
        explained_variance = np.atleast_1d(np.asarray(fitted.explained_variance_, dtype=float))
        singular_values = np.atleast_1d(np.asarray(fitted.singular_values_, dtype=float))
        mean_values = np.atleast_1d(np.asarray(fitted.mean_, dtype=float))
        variance_ratio_sum = float(np.sum(variance_ratio))

        return {
            "type": "Double",
            "basicValue": variance_ratio_sum,
            "properties": {
                "explained_variance_ratio_sum": {"type": "Double", "basicValue": variance_ratio_sum},
                "component_count": {"type": "Double", "basicValue": float(components_np.shape[0])},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "scores": {"type": "Array", "elements": mat(scores_np.tolist())},
                "components": {"type": "Array", "elements": mat(components_np.tolist())},
                "explained_variance_ratio": {"type": "Array", "elements": col(variance_ratio.tolist())},
                "explained_variance": {"type": "Array", "elements": col(explained_variance.tolist())},
                "singular_values": {"type": "Array", "elements": col(singular_values.tolist())},
                "feature_means": {"type": "Array", "elements": col(mean_values.tolist())}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

data *

2D array of numeric input data with rows as samples and columns as features.

n_components

Number of principal components to keep. Leave blank to keep the estimator default.

whiten

Whether to rescale components to unit variance after projection.

pca_solver

Singular value decomposition strategy used to fit the PCA model.

random_state

Integer seed used when the randomized solver path is active. Leave blank for the estimator default.