RF_REGRESS

Random forest regression averages predictions from many decision trees trained on resampled data and feature subsets. It is a strong nonlinear default for tabular regression and exposes feature-importance estimates.

The ensemble averages the individual tree predictions to reduce variance:

F(x) = \frac{1}{B} \sum_{b=1}^{B} T_b(x)

where B is the number of trees and T_b represents each decision tree fitted on a bootstrap sample.

This wrapper accepts rows as samples and a numeric target supplied as a single row or single column. It returns the training R^2 together with fitted predictions, residuals, and fitted feature importances.

Excel Usage

=RF_REGRESS(data, target, n_estimators, rf_reg_criterion, max_depth, min_samples_leaf, random_state)

data (list[list], required): 2D array of numeric feature data with rows as samples and columns as features.
target (list[list], required): Numeric target values as a single row, single column, or scalar when only one sample is present.
n_estimators (int, optional, default: 100): Number of trees in the forest.
rf_reg_criterion (str, optional, default: “squared_error”): Split quality measure used by each decision tree.
max_depth (int, optional, default: null): Maximum depth of each tree. Leave blank for unconstrained depth.
min_samples_leaf (int, optional, default: 1): Minimum number of samples required in each leaf.
random_state (int, optional, default: null): Integer seed for reproducible tree sampling. Leave blank for the estimator default.

Returns (dict): Excel data type containing training R^2, predictions, residuals, and fitted feature importances.

Example 1: Fit a random forest regressor on a two-feature linear trend

Inputs:

data		target	n_estimators	rf_reg_criterion	max_depth	min_samples_leaf	random_state
0	0	1	50	squared_error	4	1	0
1	0	3
0	1	4
1	1	6
2	1	8
2	2	11

Excel formula:

=RF_REGRESS({0,0;1,0;0,1;1,1;2,1;2,2}, {1;3;4;6;8;11}, 50, "squared_error", 4, 1, 0)

Expected output:

{"type":"Double","basicValue":0.962113,"properties":{"training_r2":{"type":"Double","basicValue":0.962113},"mean_squared_error":{"type":"Double","basicValue":0.4136},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":1.88}],[{"type":"Double","basicValue":3.44}],[{"type":"Double","basicValue":4.6}],[{"type":"Double","basicValue":5.52}],[{"type":"Double","basicValue":7.96}],[{"type":"Double","basicValue":10.04}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.88}],[{"type":"Double","basicValue":-0.44}],[{"type":"Double","basicValue":-0.6}],[{"type":"Double","basicValue":0.48}],[{"type":"Double","basicValue":0.04}],[{"type":"Double","basicValue":0.96}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.515282}],[{"type":"Double","basicValue":0.484718}]]},"estimator_count":{"type":"Double","basicValue":50}}}

Example 2: Flatten a single-row numeric target range for random forest regression

Inputs:

data	target						n_estimators	rf_reg_criterion	max_depth	min_samples_leaf	random_state
0	0	1	4	9	16	25	50	squared_error	5	1	0
1
2
3
4
5

Excel formula:

=RF_REGRESS({0;1;2;3;4;5}, {0,1,4,9,16,25}, 50, "squared_error", 5, 1, 0)

Expected output:

{"type":"Double","basicValue":0.974305,"properties":{"training_r2":{"type":"Double","basicValue":0.974305},"mean_squared_error":{"type":"Double","basicValue":2.03347},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0.88}],[{"type":"Double","basicValue":1.32}],[{"type":"Double","basicValue":4.04}],[{"type":"Double","basicValue":7.98}],[{"type":"Double","basicValue":14.72}],[{"type":"Double","basicValue":22.06}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.88}],[{"type":"Double","basicValue":-0.32}],[{"type":"Double","basicValue":-0.04}],[{"type":"Double","basicValue":1.02}],[{"type":"Double","basicValue":1.28}],[{"type":"Double","basicValue":2.94}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"estimator_count":{"type":"Double","basicValue":50}}}

Example 3: Use absolute-error splits for one-feature regression

Inputs:

data	target	n_estimators	rf_reg_criterion	max_depth	min_samples_leaf	random_state
0	1	50	absolute_error	4	1	0
1	3
2	5
3	7
4	9
5	11

Excel formula:

=RF_REGRESS({0;1;2;3;4;5}, {1;3;5;7;9;11}, 50, "absolute_error", 4, 1, 0)

Expected output:

{"type":"Double","basicValue":0.974446,"properties":{"training_r2":{"type":"Double","basicValue":0.974446},"mean_squared_error":{"type":"Double","basicValue":0.298133},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":1.96}],[{"type":"Double","basicValue":2.84}],[{"type":"Double","basicValue":4.84}],[{"type":"Double","basicValue":6.56}],[{"type":"Double","basicValue":8.6}],[{"type":"Double","basicValue":10.32}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.96}],[{"type":"Double","basicValue":0.16}],[{"type":"Double","basicValue":0.16}],[{"type":"Double","basicValue":0.44}],[{"type":"Double","basicValue":0.4}],[{"type":"Double","basicValue":0.68}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"estimator_count":{"type":"Double","basicValue":50}}}

Example 4: Fit a random forest regressor on a piecewise two-feature target

Inputs:

data		target	n_estimators	rf_reg_criterion	max_depth	min_samples_leaf	random_state
0	0	0	50	friedman_mse	4	1	0
0	1	1
1	0	1
1	1	3
2	1	5
2	2	8

Excel formula:

=RF_REGRESS({0,0;0,1;1,0;1,1;2,1;2,2}, {0;1;1;3;5;8}, 50, "friedman_mse", 4, 1, 0)

Expected output:

{"type":"Double","basicValue":0.960078,"properties":{"training_r2":{"type":"Double","basicValue":0.960078},"mean_squared_error":{"type":"Double","basicValue":0.306067},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0.4}],[{"type":"Double","basicValue":1.5}],[{"type":"Double","basicValue":1.38}],[{"type":"Double","basicValue":2.4}],[{"type":"Double","basicValue":4.98}],[{"type":"Double","basicValue":7.04}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.4}],[{"type":"Double","basicValue":-0.5}],[{"type":"Double","basicValue":-0.38}],[{"type":"Double","basicValue":0.6}],[{"type":"Double","basicValue":0.02}],[{"type":"Double","basicValue":0.96}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.681547}],[{"type":"Double","basicValue":0.318453}]]},"estimator_count":{"type":"Double","basicValue":50}}}

Python Code

import numpy as np
from sklearn.ensemble import RandomForestRegressor as SklearnRandomForestRegressor

def rf_regress(data, target, n_estimators=100, rf_reg_criterion='squared_error', max_depth=None, min_samples_leaf=1, random_state=None):
    """
    Fit a random forest regressor and return training predictions.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
        target (list[list]): Numeric target values as a single row, single column, or scalar when only one sample is present.
        n_estimators (int, optional): Number of trees in the forest. Default is 100.
        rf_reg_criterion (str, optional): Split quality measure used by each decision tree. Valid options: Squared Error, Absolute Error, Friedman MSE. Default is 'squared_error'.
        max_depth (int, optional): Maximum depth of each tree. Leave blank for unconstrained depth. Default is None.
        min_samples_leaf (int, optional): Minimum number of samples required in each leaf. Default is 1.
        random_state (int, optional): Integer seed for reproducible tree sampling. Leave blank for the estimator default. Default is None.

    Returns:
        dict: Excel data type containing training $R^2$, predictions, residuals, and fitted feature importances.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def col(values):
        return [[cell(value)] for value in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        return data_np, None

    def parse_target(value, sample_count):
        if not isinstance(value, list):
            labels = [value]
        elif not value:
            return None, "Error: target must be non-empty"
        elif all(not isinstance(item, list) for item in value):
            labels = value
        elif len(value) == 1:
            labels = value[0]
        elif all(isinstance(row, list) and len(row) == 1 for row in value):
            labels = [row[0] for row in value]
        else:
            return None, "Error: target must be a single row or column"

        if len(labels) != sample_count:
            return None, "Error: target length must match sample count"

        parsed = []
        for item in labels:
            item = py(item)
            if isinstance(item, bool) or not isinstance(item, (int, float)):
                return None, "Error: target values must be finite numeric scalars"
            if not np.isfinite(float(item)):
                return None, "Error: target values must be finite numeric scalars"
            parsed.append(float(item))
        return np.array(parsed, dtype=float), None

    def flat_float_list(values):
        return [float(py(item)) for item in np.asarray(values).reshape(-1).tolist()]

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        target_np, error = parse_target(target, data_np.shape[0])
        if error:
            return error

        if int(n_estimators) < 1:
            return "Error: n_estimators must be at least 1"
        criterion_value = str(rf_reg_criterion).strip().lower()
        if criterion_value not in {"squared_error", "absolute_error", "friedman_mse"}:
            return "Error: rf_reg_criterion must be 'squared_error', 'absolute_error', or 'friedman_mse'"
        depth = None if max_depth in (None, "") else int(max_depth)
        if depth is not None and depth < 1:
            return "Error: max_depth must be at least 1 when provided"
        if int(min_samples_leaf) < 1:
            return "Error: min_samples_leaf must be at least 1"

        fitted = SklearnRandomForestRegressor(
            n_estimators=int(n_estimators),
            criterion=criterion_value,
            max_depth=depth,
            min_samples_leaf=int(min_samples_leaf),
            random_state=None if random_state in (None, "") else int(random_state)
        ).fit(data_np, target_np)

        prediction_array = np.asarray(fitted.predict(data_np), dtype=float)
        residual_array = target_np - prediction_array
        predictions = flat_float_list(prediction_array)
        residuals = flat_float_list(residual_array)
        training_r2 = float(fitted.score(data_np, target_np))
        mse = float(np.mean(np.square(residual_array)))

        return {
            "type": "Double",
            "basicValue": training_r2,
            "properties": {
                "training_r2": {"type": "Double", "basicValue": training_r2},
                "mean_squared_error": {"type": "Double", "basicValue": mse},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "predictions": {"type": "Array", "elements": col(predictions)},
                "residuals": {"type": "Array", "elements": col(residuals)},
                "feature_importances": {"type": "Array", "elements": col(fitted.feature_importances_.tolist())},
                "estimator_count": {"type": "Double", "basicValue": float(len(fitted.estimators_))}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

data *

2D array of numeric feature data with rows as samples and columns as features.

target *

Numeric target values as a single row, single column, or scalar when only one sample is present.

n_estimators

Number of trees in the forest.

rf_reg_criterion

Split quality measure used by each decision tree.

max_depth

Maximum depth of each tree. Leave blank for unconstrained depth.

min_samples_leaf

Minimum number of samples required in each leaf.

random_state

Integer seed for reproducible tree sampling. Leave blank for the estimator default.