HUBER_REG
Huber regression blends squared loss near the fitted trend with absolute loss on larger errors, making it more resistant to outliers than ordinary least squares. It is a robust linear regressor for contaminated tabular datasets.
The algorithm minimizes the loss function:
\min_{w, c, \sigma} \sum_{i=1}^n \left( \sigma + H_{\epsilon}\left(\frac{y_i - X_i w - c}{\sigma}\right)\sigma \right) + \alpha ||w||_2^2
where H_{\epsilon} is the Huber loss function, \sigma scales the residuals, and \epsilon defines the threshold for outliers.
This wrapper accepts tabular feature data with rows as samples and columns as features, plus a numeric target supplied as a single row or single column. It returns the training R^2 together with fitted predictions, residuals, and robust-fit summary properties.
Excel Usage
=HUBER_REG(data, target, epsilon, max_iter, alpha, fit_intercept, tol)
data(list[list], required): 2D array of numeric feature data with rows as samples and columns as features.target(list[list], required): Numeric target values as a single row, single column, or scalar when only one sample is present.epsilon(float, optional, default: 1.35): Threshold that controls when observations are treated as outliers.max_iter(int, optional, default: 100): Maximum number of optimizer iterations.alpha(float, optional, default: 0.0001): Strength of the squared L2 regularization term.fit_intercept(bool, optional, default: true): Whether to include an intercept term in the linear model.tol(float, optional, default: 0.00001): Convergence tolerance for the optimizer.
Returns (dict): Excel data type containing training R^2, predictions, residuals, and robust-fit summary properties.
Example 1: Fit Huber regression on a lightly noisy two-feature trend
Inputs:
| data | target | epsilon | max_iter | alpha | fit_intercept | tol | |
|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | 1.35 | 200 | 0.0001 | true | 0.00001 |
| 1 | 0 | 3.1 | |||||
| 0 | 1 | 3.9 | |||||
| 1 | 1 | 6.2 | |||||
| 2 | 1 | 7.8 | |||||
| 2 | 2 | 11.1 |
Excel formula:
=HUBER_REG({0,0;1,0;0,1;1,1;2,1;2,2}, {1;3.1;3.9;6.2;7.8;11.1}, 1.35, 200, 0.0001, TRUE, 0.00001)
Expected output:
{"type":"Double","basicValue":0.997742,"properties":{"training_r2":{"type":"Double","basicValue":0.997742},"mean_squared_error":{"type":"Double","basicValue":0.024728},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0.988095}],[{"type":"Double","basicValue":3.09842}],[{"type":"Double","basicValue":3.92698}],[{"type":"Double","basicValue":6.0373}],[{"type":"Double","basicValue":8.14763}],[{"type":"Double","basicValue":11.0865}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":0.0119049}],[{"type":"Double","basicValue":0.00158119}],[{"type":"Double","basicValue":-0.0269777}],[{"type":"Double","basicValue":0.162699}],[{"type":"Double","basicValue":-0.347625}],[{"type":"Double","basicValue":0.0134922}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":2.11032},{"type":"Double","basicValue":2.93888}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":0.988095}]]},"scale":{"type":"Double","basicValue":0.021156},"outlier_count":{"type":"Double","basicValue":2},"outlier_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":false}]]},"iteration_count":{"type":"Double","basicValue":19}}}
Example 2: Flatten a single-row numeric target range for Huber regression
Inputs:
| data | target | epsilon | max_iter | alpha | fit_intercept | tol | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.1 | 3 | 5.2 | 6.8 | 9.1 | 11 | 1.35 | 200 | 0.0001 | true | 0.00001 |
| 1 | |||||||||||
| 2 | |||||||||||
| 3 | |||||||||||
| 4 | |||||||||||
| 5 |
Excel formula:
=HUBER_REG({0;1;2;3;4;5}, {1.1,3,5.2,6.8,9.1,11}, 1.35, 200, 0.0001, TRUE, 0.00001)
Expected output:
{"type":"Double","basicValue":0.998686,"properties":{"training_r2":{"type":"Double","basicValue":0.998686},"mean_squared_error":{"type":"Double","basicValue":0.0150855},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":1.07755}],[{"type":"Double","basicValue":3.06653}],[{"type":"Double","basicValue":5.05551}],[{"type":"Double","basicValue":7.04449}],[{"type":"Double","basicValue":9.03347}],[{"type":"Double","basicValue":11.0224}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":0.0224459}],[{"type":"Double","basicValue":-0.0665325}],[{"type":"Double","basicValue":0.144489}],[{"type":"Double","basicValue":-0.244489}],[{"type":"Double","basicValue":0.0665325}],[{"type":"Double","basicValue":-0.0224459}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":1.98898}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":1.07755}]]},"scale":{"type":"Double","basicValue":0.0647083},"outlier_count":{"type":"Double","basicValue":2},"outlier_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}]]},"iteration_count":{"type":"Double","basicValue":14}}}
Example 3: Fit Huber regression with one strong target outlier
Inputs:
| data | target | epsilon | max_iter | alpha | fit_intercept | tol |
|---|---|---|---|---|---|---|
| 0 | 1 | 1.2 | 200 | 0.0001 | true | 0.00001 |
| 1 | 3 | |||||
| 2 | 5 | |||||
| 3 | 7 | |||||
| 4 | 9 | |||||
| 5 | 25 |
Excel formula:
=HUBER_REG({0;1;2;3;4;5}, {1;3;5;7;9;25}, 1.2, 200, 0.0001, TRUE, 0.00001)
Expected output:
{"type":"Double","basicValue":0.475,"properties":{"training_r2":{"type":"Double","basicValue":0.475},"mean_squared_error":{"type":"Double","basicValue":32.6667},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":5}],[{"type":"Double","basicValue":7}],[{"type":"Double","basicValue":9}],[{"type":"Double","basicValue":11}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":2.00533e-9}],[{"type":"Double","basicValue":-8.37614e-9}],[{"type":"Double","basicValue":-1.87576e-8}],[{"type":"Double","basicValue":-2.91391e-8}],[{"type":"Double","basicValue":-3.95206e-8}],[{"type":"Double","basicValue":14}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":2}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"scale":{"type":"Double","basicValue":2.3514e-8},"outlier_count":{"type":"Double","basicValue":3},"outlier_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}]]},"iteration_count":{"type":"Double","basicValue":57}}}
Example 4: Fit a no-intercept Huber model on a two-feature plane
Inputs:
| data | target | epsilon | max_iter | alpha | fit_intercept | tol | |
|---|---|---|---|---|---|---|---|
| 1 | 0 | 2.1 | 1.35 | 200 | 0.0001 | false | 0.00001 |
| 0 | 1 | 2.9 | |||||
| 1 | 1 | 5 | |||||
| 2 | 1 | 7.1 | |||||
| 1 | 2 | 8 | |||||
| 2 | 2 | 10.2 |
Excel formula:
=HUBER_REG({1,0;0,1;1,1;2,1;1,2;2,2}, {2.1;2.9;5;7.1;8;10.2}, 1.35, 200, 0.0001, FALSE, 0.00001)
Expected output:
{"type":"Double","basicValue":0.999605,"properties":{"training_r2":{"type":"Double","basicValue":0.999605},"mean_squared_error":{"type":"Double","basicValue":0.00320129},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":2.0955}],[{"type":"Double","basicValue":2.9455}],[{"type":"Double","basicValue":5.04101}],[{"type":"Double","basicValue":7.13651}],[{"type":"Double","basicValue":7.98651}],[{"type":"Double","basicValue":10.082}]]},"residuals":{"type":"Array","elements":[[{"type":"Double","basicValue":0.00449516}],[{"type":"Double","basicValue":-0.0455033}],[{"type":"Double","basicValue":-0.0410081}],[{"type":"Double","basicValue":-0.0365129}],[{"type":"Double","basicValue":0.0134886}],[{"type":"Double","basicValue":0.117984}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":2.0955},{"type":"Double","basicValue":2.9455}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":0}]]},"scale":{"type":"Double","basicValue":0.0355769},"outlier_count":{"type":"Double","basicValue":1},"outlier_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"iteration_count":{"type":"Double","basicValue":16}}}
Python Code
import numpy as np
from sklearn.linear_model import HuberRegressor as SklearnHuberRegressor
def huber_reg(data, target, epsilon=1.35, max_iter=100, alpha=0.0001, fit_intercept=True, tol=1e-05):
"""
Fit a Huber regression model and return training predictions.
See: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
target (list[list]): Numeric target values as a single row, single column, or scalar when only one sample is present.
epsilon (float, optional): Threshold that controls when observations are treated as outliers. Default is 1.35.
max_iter (int, optional): Maximum number of optimizer iterations. Default is 100.
alpha (float, optional): Strength of the squared L2 regularization term. Default is 0.0001.
fit_intercept (bool, optional): Whether to include an intercept term in the linear model. Default is True.
tol (float, optional): Convergence tolerance for the optimizer. Default is 1e-05.
Returns:
dict: Excel data type containing training $R^2$, predictions, residuals, and robust-fit summary properties.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
return data_np, None
def parse_target(value, sample_count):
if not isinstance(value, list):
labels = [value]
elif not value:
return None, "Error: target must be non-empty"
elif all(not isinstance(item, list) for item in value):
labels = value
elif len(value) == 1:
labels = value[0]
elif all(isinstance(row, list) and len(row) == 1 for row in value):
labels = [row[0] for row in value]
else:
return None, "Error: target must be a single row or column"
if len(labels) != sample_count:
return None, "Error: target length must match sample count"
parsed = []
for item in labels:
item = py(item)
if isinstance(item, bool) or not isinstance(item, (int, float)):
return None, "Error: target values must be finite numeric scalars"
if not np.isfinite(float(item)):
return None, "Error: target values must be finite numeric scalars"
parsed.append(float(item))
return np.array(parsed, dtype=float), None
def flat_float_list(values):
return [float(py(item)) for item in np.asarray(values).reshape(-1).tolist()]
try:
data_np, error = parse_data(data)
if error:
return error
target_np, error = parse_target(target, data_np.shape[0])
if error:
return error
if float(epsilon) < 1:
return "Error: epsilon must be at least 1"
if int(max_iter) < 1:
return "Error: max_iter must be at least 1"
if float(alpha) < 0:
return "Error: alpha must be non-negative"
if float(tol) <= 0:
return "Error: tol must be greater than 0"
fitted = SklearnHuberRegressor(
epsilon=float(epsilon),
max_iter=int(max_iter),
alpha=float(alpha),
fit_intercept=bool(fit_intercept),
tol=float(tol)
).fit(data_np, target_np)
prediction_array = np.asarray(fitted.predict(data_np), dtype=float)
residual_array = target_np - prediction_array
predictions = flat_float_list(prediction_array)
residuals = flat_float_list(residual_array)
training_r2 = float(fitted.score(data_np, target_np))
mse = float(np.mean(np.square(residual_array)))
outlier_mask = [bool(value) for value in np.asarray(fitted.outliers_).reshape(-1).tolist()]
return {
"type": "Double",
"basicValue": training_r2,
"properties": {
"training_r2": {"type": "Double", "basicValue": training_r2},
"mean_squared_error": {"type": "Double", "basicValue": mse},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"predictions": {"type": "Array", "elements": col(predictions)},
"residuals": {"type": "Array", "elements": col(residuals)},
"coefficients": {"type": "Array", "elements": mat(np.atleast_2d(fitted.coef_).tolist())},
"intercepts": {"type": "Array", "elements": col(np.atleast_1d(fitted.intercept_).tolist())},
"scale": {"type": "Double", "basicValue": float(fitted.scale_)},
"outlier_count": {"type": "Double", "basicValue": float(sum(outlier_mask))},
"outlier_mask": {"type": "Array", "elements": col(outlier_mask)},
"iteration_count": {"type": "Double", "basicValue": float(fitted.n_iter_)}
}
}
except Exception as e:
return f"Error: {str(e)}"