KERNEL_PCA

Kernel PCA performs principal component analysis in an implicit feature space defined by a kernel function. It can capture nonlinear structure that ordinary linear PCA cannot represent directly.

The kernel trick replaces dot products in the feature space with a kernel function evaluation:

K(x_i, x_j) = \phi(x_i)^T \phi(x_j)

where \phi is the implicit nonlinear mapping.

This wrapper accepts rows as samples and columns as features. It returns the nonlinear embedding together with retained eigenvalues and the number of extracted components.

Excel Usage

=KERNEL_PCA(data, n_components, kpca_kernel, gamma, degree, coef_zero, kpca_solver, random_state)
  • data (list[list], required): 2D array of numeric input data with rows as samples and columns as features.
  • n_components (int, optional, default: 2): Number of nonlinear components to keep.
  • kpca_kernel (str, optional, default: “linear”): Kernel function used to build the implicit feature space.
  • gamma (float, optional, default: null): Kernel coefficient for RBF, polynomial, and sigmoid kernels. Leave blank to keep the estimator default.
  • degree (int, optional, default: 3): Polynomial degree when using the polynomial kernel.
  • coef_zero (float, optional, default: 1): Independent kernel term for polynomial and sigmoid kernels.
  • kpca_solver (str, optional, default: “auto”): Eigensolver used to extract kernel principal components.
  • random_state (int, optional, default: null): Integer seed used by randomized or ARPACK eigensolver paths. Leave blank for the estimator default.

Returns (dict): Excel data type containing the nonlinear embedding and retained kernel eigenvalues.

Example 1: Compute a linear kernel PCA embedding on a dense matrix

Inputs:

data n_components kpca_kernel gamma degree coef_zero kpca_solver random_state
1 0 2 linear 3 1 dense 0
2 1
3 1
4 2
5 3
6 5

Excel formula:

=KERNEL_PCA({1,0;2,1;3,1;4,2;5,3;6,5}, 2, "linear", , 3, 1, "dense", 0)

Expected output:

{"type":"Double","basicValue":33.5,"properties":{"retained_eigenvalue_sum":{"type":"Double","basicValue":33.5},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":-3.18939},{"type":"Double","basicValue":0.27894}],[{"type":"Double","basicValue":-1.77556},{"type":"Double","basicValue":0.312058}],[{"type":"Double","basicValue":-1.05209},{"type":"Double","basicValue":-0.378295}],[{"type":"Double","basicValue":0.361736},{"type":"Double","basicValue":-0.345177}],[{"type":"Double","basicValue":1.77556},{"type":"Double","basicValue":-0.312058}],[{"type":"Double","basicValue":3.87974},{"type":"Double","basicValue":0.444532}]]},"eigenvalues":{"type":"Array","elements":[[{"type":"Double","basicValue":32.7676}],[{"type":"Double","basicValue":0.732432}]]}}}

Example 2: Embed nonlinear structure with an RBF kernel

Inputs:

data n_components kpca_kernel gamma degree coef_zero kpca_solver random_state
0 0 2 rbf 0.5 3 1 dense 0
0 1
1 0
1 1
3 3
3 4
4 3
4 4

Excel formula:

=KERNEL_PCA({0,0;0,1;1,0;1,1;3,3;3,4;4,3;4,4}, 2, "rbf", 0.5, 3, 1, "dense", 0)

Expected output:

{"type":"Double","basicValue":3.21596,"properties":{"retained_eigenvalue_sum":{"type":"Double","basicValue":3.21596},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":8},"feature_count":{"type":"Double","basicValue":2},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":0.569339},{"type":"Double","basicValue":-0.397182}],[{"type":"Double","basicValue":0.568369},{"type":"Double","basicValue":-0.00317267}],[{"type":"Double","basicValue":0.568369},{"type":"Double","basicValue":-0.00317267}],[{"type":"Double","basicValue":0.563152},{"type":"Double","basicValue":0.403527}],[{"type":"Double","basicValue":-0.563152},{"type":"Double","basicValue":0.403527}],[{"type":"Double","basicValue":-0.568369},{"type":"Double","basicValue":-0.00317267}],[{"type":"Double","basicValue":-0.568369},{"type":"Double","basicValue":-0.00317267}],[{"type":"Double","basicValue":-0.569339},{"type":"Double","basicValue":-0.397182}]]},"eigenvalues":{"type":"Array","elements":[[{"type":"Double","basicValue":2.57475}],[{"type":"Double","basicValue":0.641216}]]}}}

Example 3: Use a polynomial kernel to capture curved structure

Inputs:

data n_components kpca_kernel gamma degree coef_zero kpca_solver random_state
-2 4 2 poly 0.5 2 1 dense 0
-1 1
0 0
1 1
2 4
3 9

Excel formula:

=KERNEL_PCA({-2,4;-1,1;0,0;1,1;2,4;3,9}, 2, "poly", 0.5, 2, 1, "dense", 0)

Expected output:

{"type":"Double","basicValue":1666.56,"properties":{"retained_eigenvalue_sum":{"type":"Double","basicValue":1666.56},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":-5.31672},{"type":"Double","basicValue":7.70236}],[{"type":"Double","basicValue":-10.2249},{"type":"Double","basicValue":-0.491365}],[{"type":"Double","basicValue":-10.4901},{"type":"Double","basicValue":-1.83236}],[{"type":"Double","basicValue":-9.44703},{"type":"Double","basicValue":-2.28291}],[{"type":"Double","basicValue":0.0169504},{"type":"Double","basicValue":-2.96006}],[{"type":"Double","basicValue":35.4618},{"type":"Double","basicValue":-0.135669}]]},"eigenvalues":{"type":"Array","elements":[[{"type":"Double","basicValue":1589.65}],[{"type":"Double","basicValue":76.9174}]]}}}

Example 4: Use the ARPACK solver for a seeded compact embedding

Inputs:

data n_components kpca_kernel gamma degree coef_zero kpca_solver random_state
1 1 0 2 linear 3 1 arpack 4
2 1 1
3 2 1
4 3 2
5 5 3
6 8 5

Excel formula:

=KERNEL_PCA({1,1,0;2,1,1;3,2,1;4,3,2;5,5,3;6,8,5}, 2, "linear", , 3, 1, "arpack", 4)

Expected output:

{"type":"Double","basicValue":70.5138,"properties":{"retained_eigenvalue_sum":{"type":"Double","basicValue":70.5138},"component_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":3},"embedding":{"type":"Array","elements":[[{"type":"Double","basicValue":-3.88006},{"type":"Double","basicValue":0.776081}],[{"type":"Double","basicValue":-2.9143},{"type":"Double","basicValue":-0.0346085}],[{"type":"Double","basicValue":-1.69564},{"type":"Double","basicValue":-0.332132}],[{"type":"Double","basicValue":0.000600645},{"type":"Double","basicValue":-0.600607}],[{"type":"Double","basicValue":2.42731},{"type":"Double","basicValue":-0.326868}],[{"type":"Double","basicValue":6.06208},{"type":"Double","basicValue":0.518135}]]},"eigenvalues":{"type":"Array","elements":[[{"type":"Double","basicValue":69.0639}],[{"type":"Double","basicValue":1.44985}]]}}}

Python Code

import numpy as np
from sklearn.decomposition import KernelPCA as SklearnKernelPCA

def kernel_pca(data, n_components=2, kpca_kernel='linear', gamma=None, degree=3, coef_zero=1, kpca_solver='auto', random_state=None):
    """
    Fit kernel PCA and return nonlinear embeddings with retained eigenvalue summaries.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.KernelPCA.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
        n_components (int, optional): Number of nonlinear components to keep. Default is 2.
        kpca_kernel (str, optional): Kernel function used to build the implicit feature space. Valid options: Linear, RBF, Polynomial, Sigmoid, Cosine. Default is 'linear'.
        gamma (float, optional): Kernel coefficient for RBF, polynomial, and sigmoid kernels. Leave blank to keep the estimator default. Default is None.
        degree (int, optional): Polynomial degree when using the polynomial kernel. Default is 3.
        coef_zero (float, optional): Independent kernel term for polynomial and sigmoid kernels. Default is 1.
        kpca_solver (str, optional): Eigensolver used to extract kernel principal components. Valid options: Auto, Dense, Arpack, Randomized. Default is 'auto'.
        random_state (int, optional): Integer seed used by randomized or ARPACK eigensolver paths. Leave blank for the estimator default. Default is None.

    Returns:
        dict: Excel data type containing the nonlinear embedding and retained kernel eigenvalues.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def col(values):
        return [[cell(value)] for value in values]

    def mat(values):
        return [[cell(value) for value in row] for row in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        if data_np.shape[0] < 2:
            return None, "Error: data must contain at least 2 samples"
        return data_np, None

    def orient_embedding(embedding):
        embedding_np = np.array(embedding, dtype=float, copy=True)
        for index in range(embedding_np.shape[1]):
            column = embedding_np[:, index]
            pivot_value = column[int(np.argmax(np.abs(column)))] if column.size else 0.0
            if pivot_value < 0:
                embedding_np[:, index] *= -1.0
        return embedding_np

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        component_total = int(n_components)
        max_components = data_np.shape[0]
        if component_total < 1 or component_total > max_components:
            return f"Error: n_components must be between 1 and {max_components}"

        kernel_value = str(kpca_kernel).strip().lower()
        if kernel_value not in {"linear", "rbf", "poly", "sigmoid", "cosine"}:
            return "Error: kpca_kernel must be 'linear', 'rbf', 'poly', 'sigmoid', or 'cosine'"

        solver_value = str(kpca_solver).strip().lower()
        if solver_value not in {"auto", "dense", "arpack", "randomized"}:
            return "Error: kpca_solver must be 'auto', 'dense', 'arpack', or 'randomized'"
        if solver_value == "arpack" and component_total >= data_np.shape[0]:
            return "Error: n_components must be below the sample count when kpca_solver is 'arpack'"

        if int(degree) < 1:
            return "Error: degree must be at least 1"

        gamma_value = None if gamma in (None, "") else float(gamma)
        if gamma_value is not None and gamma_value <= 0:
            return "Error: gamma must be greater than 0 when provided"

        fitted = SklearnKernelPCA(
            n_components=component_total,
            kernel=kernel_value,
            gamma=gamma_value,
            degree=int(degree),
            coef0=float(coef_zero),
            eigen_solver=solver_value,
            random_state=None if random_state in (None, "") else int(random_state)
        )

        embedding_np = orient_embedding(fitted.fit_transform(data_np))
        eigenvalues = np.atleast_1d(np.asarray(fitted.eigenvalues_, dtype=float))
        retained_sum = float(np.sum(eigenvalues))

        return {
            "type": "Double",
            "basicValue": retained_sum,
            "properties": {
                "retained_eigenvalue_sum": {"type": "Double", "basicValue": retained_sum},
                "component_count": {"type": "Double", "basicValue": float(embedding_np.shape[1])},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "embedding": {"type": "Array", "elements": mat(np.asarray(embedding_np, dtype=float).tolist())},
                "eigenvalues": {"type": "Array", "elements": col(eigenvalues.tolist())}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

2D array of numeric input data with rows as samples and columns as features.
Number of nonlinear components to keep.
Kernel function used to build the implicit feature space.
Kernel coefficient for RBF, polynomial, and sigmoid kernels. Leave blank to keep the estimator default.
Polynomial degree when using the polynomial kernel.
Independent kernel term for polynomial and sigmoid kernels.
Eigensolver used to extract kernel principal components.
Integer seed used by randomized or ARPACK eigensolver paths. Leave blank for the estimator default.