GB_CLASSIFY

Gradient boosting classification builds a sequence of shallow decision trees that iteratively correct earlier mistakes. At each stage m, a new weak learner h_m(x) is added to the model to minimize the loss function:

F_{m}(x) = F_{m-1}(x) + \gamma_m h_m(x)

This flexible nonlinear classifier is well-suited for tabular data and exposes feature-importance estimates based on how often features are used to split nodes across the ensemble.

This wrapper accepts rows as samples and a target supplied as a single row or single column. It returns training accuracy together with predicted labels, class counts, class probabilities, and fitted feature importances.

Excel Usage

=GB_CLASSIFY(data, target, n_estimators, learning_rate, max_depth, subsample, random_state)
  • data (list[list], required): 2D array of numeric feature data with rows as samples and columns as features.
  • target (list[list], required): Target labels as a single row, single column, or scalar when only one sample is present.
  • n_estimators (int, optional, default: 100): Number of boosting stages to fit.
  • learning_rate (float, optional, default: 0.1): Shrinkage factor applied to each boosting stage.
  • max_depth (int, optional, default: 3): Maximum depth of each individual regression tree.
  • subsample (float, optional, default: 1): Fraction of samples used to fit each boosting stage.
  • random_state (int, optional, default: null): Integer seed for reproducible boosting and tree construction. Leave blank for the estimator default.

Returns (dict): Excel data type containing training accuracy, predictions, probabilities, and fitted feature importances.

Example 1: Fit a gradient boosting classifier for two string-labeled groups

Inputs:

data target n_estimators learning_rate max_depth subsample random_state
0 0 cold 50 0.1 2 1 0
0 1 cold
1 0 cold
2 2 hot
2 3 hot
3 2 hot

Excel formula:

=GB_CLASSIFY({0,0;0,1;1,0;2,2;2,3;3,2}, {"cold";"cold";"cold";"hot";"hot";"hot"}, 50, 0.1, 2, 1, 0)

Expected output:

{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"hot"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"hot"}],[{"type":"String","basicValue":"hot"}],[{"type":"String","basicValue":"hot"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"cold"},{"type":"Double","basicValue":3}],[{"type":"String","basicValue":"hot"},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0.996749},{"type":"Double","basicValue":0.00325104}],[{"type":"Double","basicValue":0.996749},{"type":"Double","basicValue":0.00325104}],[{"type":"Double","basicValue":0.996749},{"type":"Double","basicValue":0.00325104}],[{"type":"Double","basicValue":0.00325104},{"type":"Double","basicValue":0.996749}],[{"type":"Double","basicValue":0.00325104},{"type":"Double","basicValue":0.996749}],[{"type":"Double","basicValue":0.00325104},{"type":"Double","basicValue":0.996749}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.293415}],[{"type":"Double","basicValue":0.706585}]]},"estimator_count":{"type":"Double","basicValue":50}}}

Example 2: Fit gradient boosting for one-dimensional numeric labels

Inputs:

data target n_estimators learning_rate max_depth subsample random_state
0 0 50 0.1 2 1 0
0.2 0
0.4 0
1.2 1
1.4 1
1.6 1

Excel formula:

=GB_CLASSIFY({0;0.2;0.4;1.2;1.4;1.6}, {0;0;0;1;1;1}, 50, 0.1, 2, 1, 0)

Expected output:

{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}]]},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0.996749},{"type":"Double","basicValue":0.00325104}],[{"type":"Double","basicValue":0.996749},{"type":"Double","basicValue":0.00325104}],[{"type":"Double","basicValue":0.996749},{"type":"Double","basicValue":0.00325104}],[{"type":"Double","basicValue":0.00325104},{"type":"Double","basicValue":0.996749}],[{"type":"Double","basicValue":0.00325104},{"type":"Double","basicValue":0.996749}],[{"type":"Double","basicValue":0.00325104},{"type":"Double","basicValue":0.996749}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"estimator_count":{"type":"Double","basicValue":50}}}

Example 3: Fit a gradient boosting classifier for three separated groups

Inputs:

data target n_estimators learning_rate max_depth subsample random_state
0 0 left 50 0.1 2 1 0
0.2 0.1 left
4 4 center
4.2 3.9 center
8 0 right
8.2 0.1 right

Excel formula:

=GB_CLASSIFY({0,0;0.2,0.1;4,4;4.2,3.9;8,0;8.2,0.1}, {"left";"left";"center";"center";"right";"right"}, 50, 0.1, 2, 1, 0)

Expected output:

{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":3},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"right"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"right"}],[{"type":"String","basicValue":"right"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"center"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"left"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"right"},{"type":"Double","basicValue":2}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0.000505015},{"type":"Double","basicValue":0.99899},{"type":"Double","basicValue":0.000505015}],[{"type":"Double","basicValue":0.000505015},{"type":"Double","basicValue":0.99899},{"type":"Double","basicValue":0.000505015}],[{"type":"Double","basicValue":0.99899},{"type":"Double","basicValue":0.000505015},{"type":"Double","basicValue":0.000505015}],[{"type":"Double","basicValue":0.99899},{"type":"Double","basicValue":0.000505015},{"type":"Double","basicValue":0.000505015}],[{"type":"Double","basicValue":0.000505015},{"type":"Double","basicValue":0.000505015},{"type":"Double","basicValue":0.99899}],[{"type":"Double","basicValue":0.000505015},{"type":"Double","basicValue":0.000505015},{"type":"Double","basicValue":0.99899}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.666667}],[{"type":"Double","basicValue":0.333333}]]},"estimator_count":{"type":"Double","basicValue":50}}}

Example 4: Flatten a single-row boolean target range for gradient boosting classification

Inputs:

data target n_estimators learning_rate max_depth subsample random_state
0 false false false true true true 50 0.1 2 1 0
0.3
0.6
1.4
1.7
2

Excel formula:

=GB_CLASSIFY({0;0.3;0.6;1.4;1.7;2}, {FALSE,FALSE,FALSE,TRUE,TRUE,TRUE}, 50, 0.1, 2, 1, 0)

Expected output:

{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"predictions":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Boolean","basicValue":false},{"type":"Double","basicValue":3}],[{"type":"Boolean","basicValue":true},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0.996749},{"type":"Double","basicValue":0.00325104}],[{"type":"Double","basicValue":0.996749},{"type":"Double","basicValue":0.00325104}],[{"type":"Double","basicValue":0.996749},{"type":"Double","basicValue":0.00325104}],[{"type":"Double","basicValue":0.00325104},{"type":"Double","basicValue":0.996749}],[{"type":"Double","basicValue":0.00325104},{"type":"Double","basicValue":0.996749}],[{"type":"Double","basicValue":0.00325104},{"type":"Double","basicValue":0.996749}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"estimator_count":{"type":"Double","basicValue":50}}}

Python Code

import numpy as np
from sklearn.ensemble import GradientBoostingClassifier as SklearnGradientBoostingClassifier

def gb_classify(data, target, n_estimators=100, learning_rate=0.1, max_depth=3, subsample=1, random_state=None):
    """
    Fit a gradient boosting classifier and return training predictions.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
        target (list[list]): Target labels as a single row, single column, or scalar when only one sample is present.
        n_estimators (int, optional): Number of boosting stages to fit. Default is 100.
        learning_rate (float, optional): Shrinkage factor applied to each boosting stage. Default is 0.1.
        max_depth (int, optional): Maximum depth of each individual regression tree. Default is 3.
        subsample (float, optional): Fraction of samples used to fit each boosting stage. Default is 1.
        random_state (int, optional): Integer seed for reproducible boosting and tree construction. Leave blank for the estimator default. Default is None.

    Returns:
        dict: Excel data type containing training accuracy, predictions, probabilities, and fitted feature importances.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def col(values):
        return [[cell(value)] for value in values]

    def mat(values):
        return [[cell(value) for value in row] for row in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        return data_np, None

    def parse_target(value, sample_count):
        if not isinstance(value, list):
            labels = [value]
        elif not value:
            return None, "Error: target must be non-empty"
        elif all(not isinstance(item, list) for item in value):
            labels = value
        elif len(value) == 1:
            labels = value[0]
        elif all(isinstance(row, list) and len(row) == 1 for row in value):
            labels = [row[0] for row in value]
        else:
            return None, "Error: target must be a single row or column"

        if len(labels) != sample_count:
            return None, "Error: target length must match sample count"

        parsed = []
        classes = []
        for item in labels:
            item = py(item)
            if isinstance(item, str):
                if not item.strip():
                    return None, "Error: target labels must not be blank"
            elif isinstance(item, bool):
                item = bool(item)
            elif isinstance(item, (int, float)) and not isinstance(item, bool):
                if not np.isfinite(float(item)):
                    return None, "Error: target labels must be finite"
                item = float(item) if isinstance(item, float) else int(item)
            else:
                return None, "Error: target labels must be scalar string, boolean, or numeric values"
            parsed.append(item)
            if not any(type(existing) is type(item) and existing == item for existing in classes):
                classes.append(item)

        if len(classes) < 2:
            return None, "Error: target must contain at least 2 classes"
        return parsed, None

    def count_table(predictions, classes):
        rows = [[{"type": "String", "basicValue": "class"}, {"type": "String", "basicValue": "count"}]]
        for class_label in classes:
            count = sum(type(prediction) is type(class_label) and prediction == class_label for prediction in predictions)
            rows.append([cell(class_label), {"type": "Double", "basicValue": float(count)}])
        return rows

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        target_values, error = parse_target(target, data_np.shape[0])
        if error:
            return error

        if int(n_estimators) < 1:
            return "Error: n_estimators must be at least 1"
        if float(learning_rate) <= 0:
            return "Error: learning_rate must be greater than 0"
        if int(max_depth) < 1:
            return "Error: max_depth must be at least 1"
        if float(subsample) <= 0 or float(subsample) > 1:
            return "Error: subsample must be greater than 0 and at most 1"

        fitted = SklearnGradientBoostingClassifier(
            n_estimators=int(n_estimators),
            learning_rate=float(learning_rate),
            max_depth=int(max_depth),
            subsample=float(subsample),
            random_state=None if random_state in (None, "") else int(random_state)
        ).fit(data_np, target_values)

        prediction_array = fitted.predict(data_np)
        predictions = [py(item) for item in prediction_array.tolist()]
        classes = [py(item) for item in fitted.classes_.tolist()]
        accuracy = float(np.mean([
            type(prediction) is type(actual) and prediction == actual
            for prediction, actual in zip(predictions, target_values)
        ]))

        return {
            "type": "Double",
            "basicValue": accuracy,
            "properties": {
                "accuracy": {"type": "Double", "basicValue": accuracy},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "class_count": {"type": "Double", "basicValue": float(len(classes))},
                "classes": {"type": "Array", "elements": col(classes)},
                "predictions": {"type": "Array", "elements": col(predictions)},
                "prediction_counts": {"type": "Array", "elements": count_table(predictions, classes)},
                "probabilities": {"type": "Array", "elements": mat(fitted.predict_proba(data_np).tolist())},
                "feature_importances": {"type": "Array", "elements": col(fitted.feature_importances_.tolist())},
                "estimator_count": {"type": "Double", "basicValue": float(fitted.n_estimators_)}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

2D array of numeric feature data with rows as samples and columns as features.
Target labels as a single row, single column, or scalar when only one sample is present.
Number of boosting stages to fit.
Shrinkage factor applied to each boosting stage.
Maximum depth of each individual regression tree.
Fraction of samples used to fit each boosting stage.
Integer seed for reproducible boosting and tree construction. Leave blank for the estimator default.