GB_REGRESS
Gradient boosting regression builds a sequence of shallow trees that iteratively correct earlier residual errors. It is a flexible nonlinear regressor for tabular data and exposes feature-importance estimates.
The ensemble builds an additive model in a forward stage-wise fashion:
F_m(x) = F_{m-1}(x) + \nu \sum_{j=1}^{J_m} \gamma_{jm} I(x \in R_{jm})
where \nu is the learning rate, and each stage m adds a tree to minimize the specified loss function.
This wrapper accepts rows as samples and a numeric target supplied as a single row or single column. It returns the training R^2 together with fitted predictions, residuals, and fitted feature importances.
Excel Usage
=GB_REGRESS(data, target, n_estimators, learning_rate, max_depth, subsample, random_state)
data(list[list], required): 2D array of numeric feature data with rows as samples and columns as features.target(list[list], required): Numeric target values as a single row, single column, or scalar when only one sample is present.n_estimators(int, optional, default: 100): Number of boosting stages to fit.learning_rate(float, optional, default: 0.1): Shrinkage factor applied to each boosting stage.max_depth(int, optional, default: 3): Maximum depth of each individual regression tree.subsample(float, optional, default: 1): Fraction of samples used to fit each boosting stage.random_state(int, optional, default: null): Integer seed for reproducible boosting and tree construction. Leave blank for the estimator default.
Returns (dict): Excel data type containing training R^2, predictions, residuals, and fitted feature importances.
Example 1: Fit a gradient boosting regressor on a two-feature linear trend
Inputs:
| data | target | n_estimators | learning_rate | max_depth | subsample | random_state | |
|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | 60 | 0.1 | 2 | 1 | 0 |
| 1 | 0 | 3 | |||||
| 0 | 1 | 4 | |||||
| 1 | 1 | 6 | |||||
| 2 | 1 | 8 | |||||
| 2 | 2 | 11 |
Excel formula:
=GB_REGRESS({0,0;1,0;0,1;1,1;2,1;2,2}, {1;3;4;6;8;11}, 60, 0.1, 2, 1, 0)
Expected output:
{"type":"Double","basicValue":0.999957,"properties":{"training_r2":{"type":"Double","basicValue":0.999957},"mean_squared_error":{"type":"Double","basicValue":0.000465387},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":1.02009}],[{"type":"Double","basicValue":3.01012}],[{"type":"Double","basicValue":4.01197}],[{"type":"Double","basicValue":5.9665}],[{"type":"Double","basicValue":8.01783}],[{"type":"Double","basicValue":10.9735}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.0200879}],[{"type":"Double","basicValue":-0.0101206}],[{"type":"Double","basicValue":-0.0119703}],[{"type":"Double","basicValue":0.0334958}],[{"type":"Double","basicValue":-0.017833}],[{"type":"Double","basicValue":0.0265161}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.752868}],[{"type":"Double","basicValue":0.247132}]]},"estimator_count":{"type":"Double","basicValue":60}}}
Example 2: Flatten a single-row numeric target range for gradient boosting regression
Inputs:
| data | target | n_estimators | learning_rate | max_depth | subsample | random_state | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | 4 | 9 | 16 | 25 | 80 | 0.1 | 2 | 1 | 0 |
| 1 | |||||||||||
| 2 | |||||||||||
| 3 | |||||||||||
| 4 | |||||||||||
| 5 |
Excel formula:
=GB_REGRESS({0;1;2;3;4;5}, {0,1,4,9,16,25}, 80, 0.1, 2, 1, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"training_r2":{"type":"Double","basicValue":1},"mean_squared_error":{"type":"Double","basicValue":0.00000939573},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0.00469531}],[{"type":"Double","basicValue":1.00243}],[{"type":"Double","basicValue":4.00045}],[{"type":"Double","basicValue":8.99966}],[{"type":"Double","basicValue":15.9974}],[{"type":"Double","basicValue":24.9954}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.00469531}],[{"type":"Double","basicValue":-0.00242558}],[{"type":"Double","basicValue":-0.000453668}],[{"type":"Double","basicValue":0.000338078}],[{"type":"Double","basicValue":0.00263296}],[{"type":"Double","basicValue":0.00460352}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"estimator_count":{"type":"Double","basicValue":80}}}
Example 3: Fit gradient boosting regression with stochastic subsampling
Inputs:
| data | target | n_estimators | learning_rate | max_depth | subsample | random_state |
|---|---|---|---|---|---|---|
| 0 | 1 | 80 | 0.1 | 2 | 0.8 | 0 |
| 1 | 3 | |||||
| 2 | 5 | |||||
| 3 | 7 | |||||
| 4 | 9 | |||||
| 5 | 11 |
Excel formula:
=GB_REGRESS({0;1;2;3;4;5}, {1;3;5;7;9;11}, 80, 0.1, 2, 0.8, 0)
Expected output:
{"type":"Double","basicValue":0.999932,"properties":{"training_r2":{"type":"Double","basicValue":0.999932},"mean_squared_error":{"type":"Double","basicValue":0.000790625},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":1.0257}],[{"type":"Double","basicValue":2.96377}],[{"type":"Double","basicValue":5.04344}],[{"type":"Double","basicValue":6.97499}],[{"type":"Double","basicValue":9.00422}],[{"type":"Double","basicValue":10.9845}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.0256953}],[{"type":"Double","basicValue":0.0362316}],[{"type":"Double","basicValue":-0.0434387}],[{"type":"Double","basicValue":0.0250072}],[{"type":"Double","basicValue":-0.00422126}],[{"type":"Double","basicValue":0.0155135}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"estimator_count":{"type":"Double","basicValue":80}}}
Example 4: Fit a gradient boosting regressor on a piecewise two-feature target
Inputs:
| data | target | n_estimators | learning_rate | max_depth | subsample | random_state | |
|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 80 | 0.1 | 2 | 1 | 0 |
| 0 | 1 | 1 | |||||
| 1 | 0 | 1 | |||||
| 1 | 1 | 3 | |||||
| 2 | 1 | 5 | |||||
| 2 | 2 | 8 |
Excel formula:
=GB_REGRESS({0,0;0,1;1,0;1,1;2,1;2,2}, {0;1;1;3;5;8}, 80, 0.1, 2, 1, 0)
Expected output:
{"type":"Double","basicValue":0.999835,"properties":{"training_r2":{"type":"Double","basicValue":0.999835},"mean_squared_error":{"type":"Double","basicValue":0.00126717},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0.00288363}],[{"type":"Double","basicValue":1.02073}],[{"type":"Double","basicValue":1.02255}],[{"type":"Double","basicValue":2.93344}],[{"type":"Double","basicValue":5.04196}],[{"type":"Double","basicValue":7.97844}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.00288363}],[{"type":"Double","basicValue":-0.0207312}],[{"type":"Double","basicValue":-0.0225512}],[{"type":"Double","basicValue":0.0665643}],[{"type":"Double","basicValue":-0.0419602}],[{"type":"Double","basicValue":0.021562}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.836253}],[{"type":"Double","basicValue":0.163747}]]},"estimator_count":{"type":"Double","basicValue":80}}}
Python Code
import numpy as np
from sklearn.ensemble import GradientBoostingRegressor as SklearnGradientBoostingRegressor
def gb_regress(data, target, n_estimators=100, learning_rate=0.1, max_depth=3, subsample=1, random_state=None):
"""
Fit a gradient boosting regressor and return training predictions.
See: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
target (list[list]): Numeric target values as a single row, single column, or scalar when only one sample is present.
n_estimators (int, optional): Number of boosting stages to fit. Default is 100.
learning_rate (float, optional): Shrinkage factor applied to each boosting stage. Default is 0.1.
max_depth (int, optional): Maximum depth of each individual regression tree. Default is 3.
subsample (float, optional): Fraction of samples used to fit each boosting stage. Default is 1.
random_state (int, optional): Integer seed for reproducible boosting and tree construction. Leave blank for the estimator default. Default is None.
Returns:
dict: Excel data type containing training $R^2$, predictions, residuals, and fitted feature importances.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
return data_np, None
def parse_target(value, sample_count):
if not isinstance(value, list):
labels = [value]
elif not value:
return None, "Error: target must be non-empty"
elif all(not isinstance(item, list) for item in value):
labels = value
elif len(value) == 1:
labels = value[0]
elif all(isinstance(row, list) and len(row) == 1 for row in value):
labels = [row[0] for row in value]
else:
return None, "Error: target must be a single row or column"
if len(labels) != sample_count:
return None, "Error: target length must match sample count"
parsed = []
for item in labels:
item = py(item)
if isinstance(item, bool) or not isinstance(item, (int, float)):
return None, "Error: target values must be finite numeric scalars"
if not np.isfinite(float(item)):
return None, "Error: target values must be finite numeric scalars"
parsed.append(float(item))
return np.array(parsed, dtype=float), None
def flat_float_list(values):
return [float(py(item)) for item in np.asarray(values).reshape(-1).tolist()]
try:
data_np, error = parse_data(data)
if error:
return error
target_np, error = parse_target(target, data_np.shape[0])
if error:
return error
if int(n_estimators) < 1:
return "Error: n_estimators must be at least 1"
if float(learning_rate) <= 0:
return "Error: learning_rate must be greater than 0"
if int(max_depth) < 1:
return "Error: max_depth must be at least 1"
if float(subsample) <= 0 or float(subsample) > 1:
return "Error: subsample must be greater than 0 and at most 1"
fitted = SklearnGradientBoostingRegressor(
n_estimators=int(n_estimators),
learning_rate=float(learning_rate),
max_depth=int(max_depth),
subsample=float(subsample),
random_state=None if random_state in (None, "") else int(random_state)
).fit(data_np, target_np)
prediction_array = np.asarray(fitted.predict(data_np), dtype=float)
residual_array = target_np - prediction_array
predictions = flat_float_list(prediction_array)
residuals = flat_float_list(residual_array)
training_r2 = float(fitted.score(data_np, target_np))
mse = float(np.mean(np.square(residual_array)))
return {
"type": "Double",
"basicValue": training_r2,
"properties": {
"training_r2": {"type": "Double", "basicValue": training_r2},
"mean_squared_error": {"type": "Double", "basicValue": mse},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"predictions": {"type": "Array", "elements": col(predictions)},
"residuals": {"type": "Array", "elements": col(residuals)},
"feature_importances": {"type": "Array", "elements": col(fitted.feature_importances_.tolist())},
"estimator_count": {"type": "Double", "basicValue": float(fitted.n_estimators_)}
}
}
except Exception as e:
return f"Error: {str(e)}"