GB_REGRESS

Gradient boosting regression builds a sequence of shallow trees that iteratively correct earlier residual errors. It is a flexible nonlinear regressor for tabular data and exposes feature-importance estimates.

The ensemble builds an additive model in a forward stage-wise fashion:

F_m(x) = F_{m-1}(x) + \nu \sum_{j=1}^{J_m} \gamma_{jm} I(x \in R_{jm})

where \nu is the learning rate, and each stage m adds a tree to minimize the specified loss function.

This wrapper accepts rows as samples and a numeric target supplied as a single row or single column. It returns the training R^2 together with fitted predictions, residuals, and fitted feature importances.

Excel Usage

=GB_REGRESS(data, target, n_estimators, learning_rate, max_depth, subsample, random_state)
  • data (list[list], required): 2D array of numeric feature data with rows as samples and columns as features.
  • target (list[list], required): Numeric target values as a single row, single column, or scalar when only one sample is present.
  • n_estimators (int, optional, default: 100): Number of boosting stages to fit.
  • learning_rate (float, optional, default: 0.1): Shrinkage factor applied to each boosting stage.
  • max_depth (int, optional, default: 3): Maximum depth of each individual regression tree.
  • subsample (float, optional, default: 1): Fraction of samples used to fit each boosting stage.
  • random_state (int, optional, default: null): Integer seed for reproducible boosting and tree construction. Leave blank for the estimator default.

Returns (dict): Excel data type containing training R^2, predictions, residuals, and fitted feature importances.

Example 1: Fit a gradient boosting regressor on a two-feature linear trend

Inputs:

data target n_estimators learning_rate max_depth subsample random_state
0 0 1 60 0.1 2 1 0
1 0 3
0 1 4
1 1 6
2 1 8
2 2 11

Excel formula:

=GB_REGRESS({0,0;1,0;0,1;1,1;2,1;2,2}, {1;3;4;6;8;11}, 60, 0.1, 2, 1, 0)

Expected output:

{"type":"Double","basicValue":0.999957,"properties":{"training_r2":{"type":"Double","basicValue":0.999957},"mean_squared_error":{"type":"Double","basicValue":0.000465387},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":1.02009}],[{"type":"Double","basicValue":3.01012}],[{"type":"Double","basicValue":4.01197}],[{"type":"Double","basicValue":5.9665}],[{"type":"Double","basicValue":8.01783}],[{"type":"Double","basicValue":10.9735}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.0200879}],[{"type":"Double","basicValue":-0.0101206}],[{"type":"Double","basicValue":-0.0119703}],[{"type":"Double","basicValue":0.0334958}],[{"type":"Double","basicValue":-0.017833}],[{"type":"Double","basicValue":0.0265161}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.752868}],[{"type":"Double","basicValue":0.247132}]]},"estimator_count":{"type":"Double","basicValue":60}}}

Example 2: Flatten a single-row numeric target range for gradient boosting regression

Inputs:

data target n_estimators learning_rate max_depth subsample random_state
0 0 1 4 9 16 25 80 0.1 2 1 0
1
2
3
4
5

Excel formula:

=GB_REGRESS({0;1;2;3;4;5}, {0,1,4,9,16,25}, 80, 0.1, 2, 1, 0)

Expected output:

{"type":"Double","basicValue":1,"properties":{"training_r2":{"type":"Double","basicValue":1},"mean_squared_error":{"type":"Double","basicValue":0.00000939573},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0.00469531}],[{"type":"Double","basicValue":1.00243}],[{"type":"Double","basicValue":4.00045}],[{"type":"Double","basicValue":8.99966}],[{"type":"Double","basicValue":15.9974}],[{"type":"Double","basicValue":24.9954}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.00469531}],[{"type":"Double","basicValue":-0.00242558}],[{"type":"Double","basicValue":-0.000453668}],[{"type":"Double","basicValue":0.000338078}],[{"type":"Double","basicValue":0.00263296}],[{"type":"Double","basicValue":0.00460352}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"estimator_count":{"type":"Double","basicValue":80}}}

Example 3: Fit gradient boosting regression with stochastic subsampling

Inputs:

data target n_estimators learning_rate max_depth subsample random_state
0 1 80 0.1 2 0.8 0
1 3
2 5
3 7
4 9
5 11

Excel formula:

=GB_REGRESS({0;1;2;3;4;5}, {1;3;5;7;9;11}, 80, 0.1, 2, 0.8, 0)

Expected output:

{"type":"Double","basicValue":0.999932,"properties":{"training_r2":{"type":"Double","basicValue":0.999932},"mean_squared_error":{"type":"Double","basicValue":0.000790625},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":1.0257}],[{"type":"Double","basicValue":2.96377}],[{"type":"Double","basicValue":5.04344}],[{"type":"Double","basicValue":6.97499}],[{"type":"Double","basicValue":9.00422}],[{"type":"Double","basicValue":10.9845}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.0256953}],[{"type":"Double","basicValue":0.0362316}],[{"type":"Double","basicValue":-0.0434387}],[{"type":"Double","basicValue":0.0250072}],[{"type":"Double","basicValue":-0.00422126}],[{"type":"Double","basicValue":0.0155135}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"estimator_count":{"type":"Double","basicValue":80}}}

Example 4: Fit a gradient boosting regressor on a piecewise two-feature target

Inputs:

data target n_estimators learning_rate max_depth subsample random_state
0 0 0 80 0.1 2 1 0
0 1 1
1 0 1
1 1 3
2 1 5
2 2 8

Excel formula:

=GB_REGRESS({0,0;0,1;1,0;1,1;2,1;2,2}, {0;1;1;3;5;8}, 80, 0.1, 2, 1, 0)

Expected output:

{"type":"Double","basicValue":0.999835,"properties":{"training_r2":{"type":"Double","basicValue":0.999835},"mean_squared_error":{"type":"Double","basicValue":0.00126717},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0.00288363}],[{"type":"Double","basicValue":1.02073}],[{"type":"Double","basicValue":1.02255}],[{"type":"Double","basicValue":2.93344}],[{"type":"Double","basicValue":5.04196}],[{"type":"Double","basicValue":7.97844}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.00288363}],[{"type":"Double","basicValue":-0.0207312}],[{"type":"Double","basicValue":-0.0225512}],[{"type":"Double","basicValue":0.0665643}],[{"type":"Double","basicValue":-0.0419602}],[{"type":"Double","basicValue":0.021562}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.836253}],[{"type":"Double","basicValue":0.163747}]]},"estimator_count":{"type":"Double","basicValue":80}}}

Python Code

import numpy as np
from sklearn.ensemble import GradientBoostingRegressor as SklearnGradientBoostingRegressor

def gb_regress(data, target, n_estimators=100, learning_rate=0.1, max_depth=3, subsample=1, random_state=None):
    """
    Fit a gradient boosting regressor and return training predictions.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
        target (list[list]): Numeric target values as a single row, single column, or scalar when only one sample is present.
        n_estimators (int, optional): Number of boosting stages to fit. Default is 100.
        learning_rate (float, optional): Shrinkage factor applied to each boosting stage. Default is 0.1.
        max_depth (int, optional): Maximum depth of each individual regression tree. Default is 3.
        subsample (float, optional): Fraction of samples used to fit each boosting stage. Default is 1.
        random_state (int, optional): Integer seed for reproducible boosting and tree construction. Leave blank for the estimator default. Default is None.

    Returns:
        dict: Excel data type containing training $R^2$, predictions, residuals, and fitted feature importances.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def col(values):
        return [[cell(value)] for value in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        return data_np, None

    def parse_target(value, sample_count):
        if not isinstance(value, list):
            labels = [value]
        elif not value:
            return None, "Error: target must be non-empty"
        elif all(not isinstance(item, list) for item in value):
            labels = value
        elif len(value) == 1:
            labels = value[0]
        elif all(isinstance(row, list) and len(row) == 1 for row in value):
            labels = [row[0] for row in value]
        else:
            return None, "Error: target must be a single row or column"

        if len(labels) != sample_count:
            return None, "Error: target length must match sample count"

        parsed = []
        for item in labels:
            item = py(item)
            if isinstance(item, bool) or not isinstance(item, (int, float)):
                return None, "Error: target values must be finite numeric scalars"
            if not np.isfinite(float(item)):
                return None, "Error: target values must be finite numeric scalars"
            parsed.append(float(item))
        return np.array(parsed, dtype=float), None

    def flat_float_list(values):
        return [float(py(item)) for item in np.asarray(values).reshape(-1).tolist()]

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        target_np, error = parse_target(target, data_np.shape[0])
        if error:
            return error

        if int(n_estimators) < 1:
            return "Error: n_estimators must be at least 1"
        if float(learning_rate) <= 0:
            return "Error: learning_rate must be greater than 0"
        if int(max_depth) < 1:
            return "Error: max_depth must be at least 1"
        if float(subsample) <= 0 or float(subsample) > 1:
            return "Error: subsample must be greater than 0 and at most 1"

        fitted = SklearnGradientBoostingRegressor(
            n_estimators=int(n_estimators),
            learning_rate=float(learning_rate),
            max_depth=int(max_depth),
            subsample=float(subsample),
            random_state=None if random_state in (None, "") else int(random_state)
        ).fit(data_np, target_np)

        prediction_array = np.asarray(fitted.predict(data_np), dtype=float)
        residual_array = target_np - prediction_array
        predictions = flat_float_list(prediction_array)
        residuals = flat_float_list(residual_array)
        training_r2 = float(fitted.score(data_np, target_np))
        mse = float(np.mean(np.square(residual_array)))

        return {
            "type": "Double",
            "basicValue": training_r2,
            "properties": {
                "training_r2": {"type": "Double", "basicValue": training_r2},
                "mean_squared_error": {"type": "Double", "basicValue": mse},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "predictions": {"type": "Array", "elements": col(predictions)},
                "residuals": {"type": "Array", "elements": col(residuals)},
                "feature_importances": {"type": "Array", "elements": col(fitted.feature_importances_.tolist())},
                "estimator_count": {"type": "Double", "basicValue": float(fitted.n_estimators_)}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

2D array of numeric feature data with rows as samples and columns as features.
Numeric target values as a single row, single column, or scalar when only one sample is present.
Number of boosting stages to fit.
Shrinkage factor applied to each boosting stage.
Maximum depth of each individual regression tree.
Fraction of samples used to fit each boosting stage.
Integer seed for reproducible boosting and tree construction. Leave blank for the estimator default.