HUBER_REG

Huber regression blends squared loss near the fitted trend with absolute loss on larger errors, making it more resistant to outliers than ordinary least squares. It is a robust linear regressor for contaminated tabular datasets.

The algorithm minimizes the loss function:

\min_{w, c, \sigma} \sum_{i=1}^n \left( \sigma + H_{\epsilon}\left(\frac{y_i - X_i w - c}{\sigma}\right)\sigma \right) + \alpha ||w||_2^2

where H_{\epsilon} is the Huber loss function, \sigma scales the residuals, and \epsilon defines the threshold for outliers.

This wrapper accepts tabular feature data with rows as samples and columns as features, plus a numeric target supplied as a single row or single column. It returns the training R^2 together with fitted predictions, residuals, and robust-fit summary properties.

Excel Usage

=HUBER_REG(data, target, epsilon, max_iter, alpha, fit_intercept, tol)
  • data (list[list], required): 2D array of numeric feature data with rows as samples and columns as features.
  • target (list[list], required): Numeric target values as a single row, single column, or scalar when only one sample is present.
  • epsilon (float, optional, default: 1.35): Threshold that controls when observations are treated as outliers.
  • max_iter (int, optional, default: 100): Maximum number of optimizer iterations.
  • alpha (float, optional, default: 0.0001): Strength of the squared L2 regularization term.
  • fit_intercept (bool, optional, default: true): Whether to include an intercept term in the linear model.
  • tol (float, optional, default: 0.00001): Convergence tolerance for the optimizer.

Returns (dict): Excel data type containing training R^2, predictions, residuals, and robust-fit summary properties.

Example 1: Fit Huber regression on a lightly noisy two-feature trend

Inputs:

data target epsilon max_iter alpha fit_intercept tol
0 0 1 1.35 200 0.0001 true 0.00001
1 0 3.1
0 1 3.9
1 1 6.2
2 1 7.8
2 2 11.1

Excel formula:

=HUBER_REG({0,0;1,0;0,1;1,1;2,1;2,2}, {1;3.1;3.9;6.2;7.8;11.1}, 1.35, 200, 0.0001, TRUE, 0.00001)

Expected output:

{"type":"Double","basicValue":0.997742,"properties":{"training_r2":{"type":"Double","basicValue":0.997742},"mean_squared_error":{"type":"Double","basicValue":0.024728},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0.988095}],[{"type":"Double","basicValue":3.09842}],[{"type":"Double","basicValue":3.92698}],[{"type":"Double","basicValue":6.0373}],[{"type":"Double","basicValue":8.14763}],[{"type":"Double","basicValue":11.0865}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":0.0119049}],[{"type":"Double","basicValue":0.00158119}],[{"type":"Double","basicValue":-0.0269777}],[{"type":"Double","basicValue":0.162699}],[{"type":"Double","basicValue":-0.347625}],[{"type":"Double","basicValue":0.0134922}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":2.11032},{"type":"Double","basicValue":2.93888}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":0.988095}]]},"scale":{"type":"Double","basicValue":0.021156},"outlier_count":{"type":"Double","basicValue":2},"outlier_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":false}]]},"iteration_count":{"type":"Double","basicValue":19}}}

Example 2: Flatten a single-row numeric target range for Huber regression

Inputs:

data target epsilon max_iter alpha fit_intercept tol
0 1.1 3 5.2 6.8 9.1 11 1.35 200 0.0001 true 0.00001
1
2
3
4
5

Excel formula:

=HUBER_REG({0;1;2;3;4;5}, {1.1,3,5.2,6.8,9.1,11}, 1.35, 200, 0.0001, TRUE, 0.00001)

Expected output:

{"type":"Double","basicValue":0.998686,"properties":{"training_r2":{"type":"Double","basicValue":0.998686},"mean_squared_error":{"type":"Double","basicValue":0.0150855},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":1.07755}],[{"type":"Double","basicValue":3.06653}],[{"type":"Double","basicValue":5.05551}],[{"type":"Double","basicValue":7.04449}],[{"type":"Double","basicValue":9.03347}],[{"type":"Double","basicValue":11.0224}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":0.0224459}],[{"type":"Double","basicValue":-0.0665325}],[{"type":"Double","basicValue":0.144489}],[{"type":"Double","basicValue":-0.244489}],[{"type":"Double","basicValue":0.0665325}],[{"type":"Double","basicValue":-0.0224459}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":1.98898}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":1.07755}]]},"scale":{"type":"Double","basicValue":0.0647083},"outlier_count":{"type":"Double","basicValue":2},"outlier_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}]]},"iteration_count":{"type":"Double","basicValue":14}}}

Example 3: Fit Huber regression with one strong target outlier

Inputs:

data target epsilon max_iter alpha fit_intercept tol
0 1 1.2 200 0.0001 true 0.00001
1 3
2 5
3 7
4 9
5 25

Excel formula:

=HUBER_REG({0;1;2;3;4;5}, {1;3;5;7;9;25}, 1.2, 200, 0.0001, TRUE, 0.00001)

Expected output:

{"type":"Double","basicValue":0.475,"properties":{"training_r2":{"type":"Double","basicValue":0.475},"mean_squared_error":{"type":"Double","basicValue":32.6667},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":5}],[{"type":"Double","basicValue":7}],[{"type":"Double","basicValue":9}],[{"type":"Double","basicValue":11}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":2.00533e-9}],[{"type":"Double","basicValue":-8.37614e-9}],[{"type":"Double","basicValue":-1.87576e-8}],[{"type":"Double","basicValue":-2.91391e-8}],[{"type":"Double","basicValue":-3.95206e-8}],[{"type":"Double","basicValue":14}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":2}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"scale":{"type":"Double","basicValue":2.3514e-8},"outlier_count":{"type":"Double","basicValue":3},"outlier_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}]]},"iteration_count":{"type":"Double","basicValue":57}}}

Example 4: Fit a no-intercept Huber model on a two-feature plane

Inputs:

data target epsilon max_iter alpha fit_intercept tol
1 0 2.1 1.35 200 0.0001 false 0.00001
0 1 2.9
1 1 5
2 1 7.1
1 2 8
2 2 10.2

Excel formula:

=HUBER_REG({1,0;0,1;1,1;2,1;1,2;2,2}, {2.1;2.9;5;7.1;8;10.2}, 1.35, 200, 0.0001, FALSE, 0.00001)

Expected output:

{"type":"Double","basicValue":0.999605,"properties":{"training_r2":{"type":"Double","basicValue":0.999605},"mean_squared_error":{"type":"Double","basicValue":0.00320129},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":2.0955}],[{"type":"Double","basicValue":2.9455}],[{"type":"Double","basicValue":5.04101}],[{"type":"Double","basicValue":7.13651}],[{"type":"Double","basicValue":7.98651}],[{"type":"Double","basicValue":10.082}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":0.00449516}],[{"type":"Double","basicValue":-0.0455033}],[{"type":"Double","basicValue":-0.0410081}],[{"type":"Double","basicValue":-0.0365129}],[{"type":"Double","basicValue":0.0134886}],[{"type":"Double","basicValue":0.117984}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":2.0955},{"type":"Double","basicValue":2.9455}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":0}]]},"scale":{"type":"Double","basicValue":0.0355769},"outlier_count":{"type":"Double","basicValue":1},"outlier_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"iteration_count":{"type":"Double","basicValue":16}}}

Python Code

import numpy as np
from sklearn.linear_model import HuberRegressor as SklearnHuberRegressor

def huber_reg(data, target, epsilon=1.35, max_iter=100, alpha=0.0001, fit_intercept=True, tol=1e-05):
    """
    Fit a Huber regression model and return training predictions.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
        target (list[list]): Numeric target values as a single row, single column, or scalar when only one sample is present.
        epsilon (float, optional): Threshold that controls when observations are treated as outliers. Default is 1.35.
        max_iter (int, optional): Maximum number of optimizer iterations. Default is 100.
        alpha (float, optional): Strength of the squared L2 regularization term. Default is 0.0001.
        fit_intercept (bool, optional): Whether to include an intercept term in the linear model. Default is True.
        tol (float, optional): Convergence tolerance for the optimizer. Default is 1e-05.

    Returns:
        dict: Excel data type containing training $R^2$, predictions, residuals, and robust-fit summary properties.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def col(values):
        return [[cell(value)] for value in values]

    def mat(values):
        return [[cell(value) for value in row] for row in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        return data_np, None

    def parse_target(value, sample_count):
        if not isinstance(value, list):
            labels = [value]
        elif not value:
            return None, "Error: target must be non-empty"
        elif all(not isinstance(item, list) for item in value):
            labels = value
        elif len(value) == 1:
            labels = value[0]
        elif all(isinstance(row, list) and len(row) == 1 for row in value):
            labels = [row[0] for row in value]
        else:
            return None, "Error: target must be a single row or column"

        if len(labels) != sample_count:
            return None, "Error: target length must match sample count"

        parsed = []
        for item in labels:
            item = py(item)
            if isinstance(item, bool) or not isinstance(item, (int, float)):
                return None, "Error: target values must be finite numeric scalars"
            if not np.isfinite(float(item)):
                return None, "Error: target values must be finite numeric scalars"
            parsed.append(float(item))
        return np.array(parsed, dtype=float), None

    def flat_float_list(values):
        return [float(py(item)) for item in np.asarray(values).reshape(-1).tolist()]

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        target_np, error = parse_target(target, data_np.shape[0])
        if error:
            return error

        if float(epsilon) < 1:
            return "Error: epsilon must be at least 1"
        if int(max_iter) < 1:
            return "Error: max_iter must be at least 1"
        if float(alpha) < 0:
            return "Error: alpha must be non-negative"
        if float(tol) <= 0:
            return "Error: tol must be greater than 0"

        fitted = SklearnHuberRegressor(
            epsilon=float(epsilon),
            max_iter=int(max_iter),
            alpha=float(alpha),
            fit_intercept=bool(fit_intercept),
            tol=float(tol)
        ).fit(data_np, target_np)

        prediction_array = np.asarray(fitted.predict(data_np), dtype=float)
        residual_array = target_np - prediction_array
        predictions = flat_float_list(prediction_array)
        residuals = flat_float_list(residual_array)
        training_r2 = float(fitted.score(data_np, target_np))
        mse = float(np.mean(np.square(residual_array)))
        outlier_mask = [bool(value) for value in np.asarray(fitted.outliers_).reshape(-1).tolist()]

        return {
            "type": "Double",
            "basicValue": training_r2,
            "properties": {
                "training_r2": {"type": "Double", "basicValue": training_r2},
                "mean_squared_error": {"type": "Double", "basicValue": mse},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "predictions": {"type": "Array", "elements": col(predictions)},
                "residuals": {"type": "Array", "elements": col(residuals)},
                "coefficients": {"type": "Array", "elements": mat(np.atleast_2d(fitted.coef_).tolist())},
                "intercepts": {"type": "Array", "elements": col(np.atleast_1d(fitted.intercept_).tolist())},
                "scale": {"type": "Double", "basicValue": float(fitted.scale_)},
                "outlier_count": {"type": "Double", "basicValue": float(sum(outlier_mask))},
                "outlier_mask": {"type": "Array", "elements": col(outlier_mask)},
                "iteration_count": {"type": "Double", "basicValue": float(fitted.n_iter_)}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

2D array of numeric feature data with rows as samples and columns as features.
Numeric target values as a single row, single column, or scalar when only one sample is present.
Threshold that controls when observations are treated as outliers.
Maximum number of optimizer iterations.
Strength of the squared L2 regularization term.
Whether to include an intercept term in the linear model.
Convergence tolerance for the optimizer.