RF_REGRESS

Random forest regression averages predictions from many decision trees trained on resampled data and feature subsets. It is a strong nonlinear default for tabular regression and exposes feature-importance estimates.

The ensemble averages the individual tree predictions to reduce variance:

F(x) = \frac{1}{B} \sum_{b=1}^{B} T_b(x)

where B is the number of trees and T_b represents each decision tree fitted on a bootstrap sample.

This wrapper accepts rows as samples and a numeric target supplied as a single row or single column. It returns the training R^2 together with fitted predictions, residuals, and fitted feature importances.

Excel Usage

=RF_REGRESS(data, target, n_estimators, rf_reg_criterion, max_depth, min_samples_leaf, random_state)
  • data (list[list], required): 2D array of numeric feature data with rows as samples and columns as features.
  • target (list[list], required): Numeric target values as a single row, single column, or scalar when only one sample is present.
  • n_estimators (int, optional, default: 100): Number of trees in the forest.
  • rf_reg_criterion (str, optional, default: “squared_error”): Split quality measure used by each decision tree.
  • max_depth (int, optional, default: null): Maximum depth of each tree. Leave blank for unconstrained depth.
  • min_samples_leaf (int, optional, default: 1): Minimum number of samples required in each leaf.
  • random_state (int, optional, default: null): Integer seed for reproducible tree sampling. Leave blank for the estimator default.

Returns (dict): Excel data type containing training R^2, predictions, residuals, and fitted feature importances.

Example 1: Fit a random forest regressor on a two-feature linear trend

Inputs:

data target n_estimators rf_reg_criterion max_depth min_samples_leaf random_state
0 0 1 50 squared_error 4 1 0
1 0 3
0 1 4
1 1 6
2 1 8
2 2 11

Excel formula:

=RF_REGRESS({0,0;1,0;0,1;1,1;2,1;2,2}, {1;3;4;6;8;11}, 50, "squared_error", 4, 1, 0)

Expected output:

{"type":"Double","basicValue":0.962113,"properties":{"training_r2":{"type":"Double","basicValue":0.962113},"mean_squared_error":{"type":"Double","basicValue":0.4136},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":1.88}],[{"type":"Double","basicValue":3.44}],[{"type":"Double","basicValue":4.6}],[{"type":"Double","basicValue":5.52}],[{"type":"Double","basicValue":7.96}],[{"type":"Double","basicValue":10.04}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.88}],[{"type":"Double","basicValue":-0.44}],[{"type":"Double","basicValue":-0.6}],[{"type":"Double","basicValue":0.48}],[{"type":"Double","basicValue":0.04}],[{"type":"Double","basicValue":0.96}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.515282}],[{"type":"Double","basicValue":0.484718}]]},"estimator_count":{"type":"Double","basicValue":50}}}

Example 2: Flatten a single-row numeric target range for random forest regression

Inputs:

data target n_estimators rf_reg_criterion max_depth min_samples_leaf random_state
0 0 1 4 9 16 25 50 squared_error 5 1 0
1
2
3
4
5

Excel formula:

=RF_REGRESS({0;1;2;3;4;5}, {0,1,4,9,16,25}, 50, "squared_error", 5, 1, 0)

Expected output:

{"type":"Double","basicValue":0.974305,"properties":{"training_r2":{"type":"Double","basicValue":0.974305},"mean_squared_error":{"type":"Double","basicValue":2.03347},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0.88}],[{"type":"Double","basicValue":1.32}],[{"type":"Double","basicValue":4.04}],[{"type":"Double","basicValue":7.98}],[{"type":"Double","basicValue":14.72}],[{"type":"Double","basicValue":22.06}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.88}],[{"type":"Double","basicValue":-0.32}],[{"type":"Double","basicValue":-0.04}],[{"type":"Double","basicValue":1.02}],[{"type":"Double","basicValue":1.28}],[{"type":"Double","basicValue":2.94}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"estimator_count":{"type":"Double","basicValue":50}}}

Example 3: Use absolute-error splits for one-feature regression

Inputs:

data target n_estimators rf_reg_criterion max_depth min_samples_leaf random_state
0 1 50 absolute_error 4 1 0
1 3
2 5
3 7
4 9
5 11

Excel formula:

=RF_REGRESS({0;1;2;3;4;5}, {1;3;5;7;9;11}, 50, "absolute_error", 4, 1, 0)

Expected output:

{"type":"Double","basicValue":0.974446,"properties":{"training_r2":{"type":"Double","basicValue":0.974446},"mean_squared_error":{"type":"Double","basicValue":0.298133},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":1.96}],[{"type":"Double","basicValue":2.84}],[{"type":"Double","basicValue":4.84}],[{"type":"Double","basicValue":6.56}],[{"type":"Double","basicValue":8.6}],[{"type":"Double","basicValue":10.32}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.96}],[{"type":"Double","basicValue":0.16}],[{"type":"Double","basicValue":0.16}],[{"type":"Double","basicValue":0.44}],[{"type":"Double","basicValue":0.4}],[{"type":"Double","basicValue":0.68}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"estimator_count":{"type":"Double","basicValue":50}}}

Example 4: Fit a random forest regressor on a piecewise two-feature target

Inputs:

data target n_estimators rf_reg_criterion max_depth min_samples_leaf random_state
0 0 0 50 friedman_mse 4 1 0
0 1 1
1 0 1
1 1 3
2 1 5
2 2 8

Excel formula:

=RF_REGRESS({0,0;0,1;1,0;1,1;2,1;2,2}, {0;1;1;3;5;8}, 50, "friedman_mse", 4, 1, 0)

Expected output:

{"type":"Double","basicValue":0.960078,"properties":{"training_r2":{"type":"Double","basicValue":0.960078},"mean_squared_error":{"type":"Double","basicValue":0.306067},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0.4}],[{"type":"Double","basicValue":1.5}],[{"type":"Double","basicValue":1.38}],[{"type":"Double","basicValue":2.4}],[{"type":"Double","basicValue":4.98}],[{"type":"Double","basicValue":7.04}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.4}],[{"type":"Double","basicValue":-0.5}],[{"type":"Double","basicValue":-0.38}],[{"type":"Double","basicValue":0.6}],[{"type":"Double","basicValue":0.02}],[{"type":"Double","basicValue":0.96}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.681547}],[{"type":"Double","basicValue":0.318453}]]},"estimator_count":{"type":"Double","basicValue":50}}}

Python Code

import numpy as np
from sklearn.ensemble import RandomForestRegressor as SklearnRandomForestRegressor

def rf_regress(data, target, n_estimators=100, rf_reg_criterion='squared_error', max_depth=None, min_samples_leaf=1, random_state=None):
    """
    Fit a random forest regressor and return training predictions.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
        target (list[list]): Numeric target values as a single row, single column, or scalar when only one sample is present.
        n_estimators (int, optional): Number of trees in the forest. Default is 100.
        rf_reg_criterion (str, optional): Split quality measure used by each decision tree. Valid options: Squared Error, Absolute Error, Friedman MSE. Default is 'squared_error'.
        max_depth (int, optional): Maximum depth of each tree. Leave blank for unconstrained depth. Default is None.
        min_samples_leaf (int, optional): Minimum number of samples required in each leaf. Default is 1.
        random_state (int, optional): Integer seed for reproducible tree sampling. Leave blank for the estimator default. Default is None.

    Returns:
        dict: Excel data type containing training $R^2$, predictions, residuals, and fitted feature importances.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def col(values):
        return [[cell(value)] for value in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        return data_np, None

    def parse_target(value, sample_count):
        if not isinstance(value, list):
            labels = [value]
        elif not value:
            return None, "Error: target must be non-empty"
        elif all(not isinstance(item, list) for item in value):
            labels = value
        elif len(value) == 1:
            labels = value[0]
        elif all(isinstance(row, list) and len(row) == 1 for row in value):
            labels = [row[0] for row in value]
        else:
            return None, "Error: target must be a single row or column"

        if len(labels) != sample_count:
            return None, "Error: target length must match sample count"

        parsed = []
        for item in labels:
            item = py(item)
            if isinstance(item, bool) or not isinstance(item, (int, float)):
                return None, "Error: target values must be finite numeric scalars"
            if not np.isfinite(float(item)):
                return None, "Error: target values must be finite numeric scalars"
            parsed.append(float(item))
        return np.array(parsed, dtype=float), None

    def flat_float_list(values):
        return [float(py(item)) for item in np.asarray(values).reshape(-1).tolist()]

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        target_np, error = parse_target(target, data_np.shape[0])
        if error:
            return error

        if int(n_estimators) < 1:
            return "Error: n_estimators must be at least 1"
        criterion_value = str(rf_reg_criterion).strip().lower()
        if criterion_value not in {"squared_error", "absolute_error", "friedman_mse"}:
            return "Error: rf_reg_criterion must be 'squared_error', 'absolute_error', or 'friedman_mse'"
        depth = None if max_depth in (None, "") else int(max_depth)
        if depth is not None and depth < 1:
            return "Error: max_depth must be at least 1 when provided"
        if int(min_samples_leaf) < 1:
            return "Error: min_samples_leaf must be at least 1"

        fitted = SklearnRandomForestRegressor(
            n_estimators=int(n_estimators),
            criterion=criterion_value,
            max_depth=depth,
            min_samples_leaf=int(min_samples_leaf),
            random_state=None if random_state in (None, "") else int(random_state)
        ).fit(data_np, target_np)

        prediction_array = np.asarray(fitted.predict(data_np), dtype=float)
        residual_array = target_np - prediction_array
        predictions = flat_float_list(prediction_array)
        residuals = flat_float_list(residual_array)
        training_r2 = float(fitted.score(data_np, target_np))
        mse = float(np.mean(np.square(residual_array)))

        return {
            "type": "Double",
            "basicValue": training_r2,
            "properties": {
                "training_r2": {"type": "Double", "basicValue": training_r2},
                "mean_squared_error": {"type": "Double", "basicValue": mse},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "predictions": {"type": "Array", "elements": col(predictions)},
                "residuals": {"type": "Array", "elements": col(residuals)},
                "feature_importances": {"type": "Array", "elements": col(fitted.feature_importances_.tolist())},
                "estimator_count": {"type": "Double", "basicValue": float(len(fitted.estimators_))}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

2D array of numeric feature data with rows as samples and columns as features.
Numeric target values as a single row, single column, or scalar when only one sample is present.
Number of trees in the forest.
Split quality measure used by each decision tree.
Maximum depth of each tree. Leave blank for unconstrained depth.
Minimum number of samples required in each leaf.
Integer seed for reproducible tree sampling. Leave blank for the estimator default.