Classification
GAUSSIAN_NB
Gaussian naive Bayes is a probabilistic classifier that applies Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable:
P(y \mid x_1, \dots, x_n) \propto P(y) \prod_{i=1}^n P(x_i \mid y)
In Gaussian naive Bayes, the likelihood of the features is assumed to follow a normal distribution:
P(x_i \mid y) = \frac{1}{\sqrt{2\pi\sigma_{iy}^2}} \exp\left(-\frac{(x_i - \mu_{iy})^2}{2\sigma_{iy}^2}\right)
This implementation accepts rows as samples and a target supplied as a single row or single column. It returns training accuracy together with predicted labels, class counts, fitted probabilities, class priors, and per-class Gaussian parameters.
Excel Usage
=GAUSSIAN_NB(data, target, var_smoothing)
data(list[list], required): 2D array of numeric feature data with rows as samples and columns as features.target(list[list], required): Target labels as a single row, single column, or scalar when only one sample is present.var_smoothing(float, optional, default: 1e-9): Portion of the largest feature variance added for numerical stability.
Returns (dict): Excel data type containing training accuracy, predictions, probabilities, and fitted Gaussian parameter arrays.
Example 1: Fit Gaussian naive Bayes for two string-labeled classes
Inputs:
| data | target | var_smoothing | |
|---|---|---|---|
| 0 | 0 | low | 1e-9 |
| 0.1 | 0.2 | low | |
| 0.2 | 0 | low | |
| 2 | 2 | high | |
| 2.1 | 2.2 | high | |
| 2.2 | 2 | high |
Excel formula:
=GAUSSIAN_NB({0,0;0.1,0.2;0.2,0;2,2;2.1,2.2;2.2,2}, {"low";"low";"low";"high";"high";"high"}, 1e-9)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"high"}],[{"type":"String","basicValue":"low"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"low"}],[{"type":"String","basicValue":"low"}],[{"type":"String","basicValue":"low"}],[{"type":"String","basicValue":"high"}],[{"type":"String","basicValue":"high"}],[{"type":"String","basicValue":"high"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"high"},{"type":"Double","basicValue":3}],[{"type":"String","basicValue":"low"},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":2.83256e-248},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1.05745e-215},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":3.23477e-222},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3.45682e-209}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":9.25966e-242}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3.027e-235}]]},"class_priors":{"type":"Array","elements":[[{"type":"Double","basicValue":0.5}],[{"type":"Double","basicValue":0.5}]]},"theta":{"type":"Array","elements":[[{"type":"Double","basicValue":2.1},{"type":"Double","basicValue":2.06667}],[{"type":"Double","basicValue":0.1},{"type":"Double","basicValue":0.0666667}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.00666667},{"type":"Double","basicValue":0.00888889}],[{"type":"Double","basicValue":0.00666667},{"type":"Double","basicValue":0.00888889}]]}}}
Example 2: Fit Gaussian naive Bayes for one-dimensional numeric labels
Inputs:
| data | target | var_smoothing |
|---|---|---|
| 0 | 0 | 1e-9 |
| 0.2 | 0 | |
| 0.4 | 0 | |
| 1.2 | 1 | |
| 1.4 | 1 | |
| 1.6 | 1 |
Excel formula:
=GAUSSIAN_NB({0;0.2;0.4;1.2;1.4;1.6}, {0;0;0;1;1;1}, 1e-9)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}]]},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":2.31952e-16}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":1.87953e-12}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":1.523e-8}],[{"type":"Double","basicValue":1.523e-8},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1.87953e-12},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":2.31952e-16},{"type":"Double","basicValue":1}]]},"class_priors":{"type":"Array","elements":[[{"type":"Double","basicValue":0.5}],[{"type":"Double","basicValue":0.5}]]},"theta":{"type":"Array","elements":[[{"type":"Double","basicValue":0.2}],[{"type":"Double","basicValue":1.4}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.0266667}],[{"type":"Double","basicValue":0.0266667}]]}}}
Example 3: Fit Gaussian naive Bayes for three separated groups
Inputs:
| data | target | var_smoothing | |
|---|---|---|---|
| 0 | 0 | left | 1e-9 |
| 0.2 | 0.1 | left | |
| 4 | 4 | center | |
| 4.2 | 3.9 | center | |
| 8 | 0 | right | |
| 8.2 | 0.1 | right |
Excel formula:
=GAUSSIAN_NB({0,0;0.2,0.1;4,4;4.2,3.9;8,0;8.2,0.1}, {"left";"left";"center";"center";"right";"right"}, 1e-9)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":3},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"right"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"right"}],[{"type":"String","basicValue":"right"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"center"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"left"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"right"},{"type":"Double","basicValue":2}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}]]},"class_priors":{"type":"Array","elements":[[{"type":"Double","basicValue":0.333333}],[{"type":"Double","basicValue":0.333333}],[{"type":"Double","basicValue":0.333333}]]},"theta":{"type":"Array","elements":[[{"type":"Double","basicValue":4.1},{"type":"Double","basicValue":3.95}],[{"type":"Double","basicValue":0.1},{"type":"Double","basicValue":0.05}],[{"type":"Double","basicValue":8.1},{"type":"Double","basicValue":0.05}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.01},{"type":"Double","basicValue":0.00250001}],[{"type":"Double","basicValue":0.01},{"type":"Double","basicValue":0.00250001}],[{"type":"Double","basicValue":0.01},{"type":"Double","basicValue":0.00250001}]]}}}
Example 4: Flatten a single-row boolean target range for Gaussian naive Bayes
Inputs:
| data | target | var_smoothing | |||||
|---|---|---|---|---|---|---|---|
| 0 | false | false | false | true | true | true | 1e-9 |
| 0.3 | |||||||
| 0.6 | |||||||
| 1.4 | |||||||
| 1.7 | |||||||
| 2 |
Excel formula:
=GAUSSIAN_NB({0;0.3;0.6;1.4;1.7;2}, {FALSE,FALSE,FALSE,TRUE,TRUE,TRUE}, 1e-9)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"predictions":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Boolean","basicValue":false},{"type":"Double","basicValue":3}],[{"type":"Boolean","basicValue":true},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":7.35296e-11}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":8.0635e-8}],[{"type":"Double","basicValue":0.999912},{"type":"Double","basicValue":0.0000884192}],[{"type":"Double","basicValue":0.0000884192},{"type":"Double","basicValue":0.999912}],[{"type":"Double","basicValue":8.0635e-8},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":7.35296e-11},{"type":"Double","basicValue":1}]]},"class_priors":{"type":"Array","elements":[[{"type":"Double","basicValue":0.5}],[{"type":"Double","basicValue":0.5}]]},"theta":{"type":"Array","elements":[[{"type":"Double","basicValue":0.3}],[{"type":"Double","basicValue":1.7}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.06}],[{"type":"Double","basicValue":0.06}]]}}}
Python Code
Show Code
import numpy as np
from sklearn.naive_bayes import GaussianNB as SklearnGaussianNB
def gaussian_nb(data, target, var_smoothing=1e-09):
"""
Fit a Gaussian naive Bayes classifier and return training predictions.
See: https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
target (list[list]): Target labels as a single row, single column, or scalar when only one sample is present.
var_smoothing (float, optional): Portion of the largest feature variance added for numerical stability. Default is 1e-09.
Returns:
dict: Excel data type containing training accuracy, predictions, probabilities, and fitted Gaussian parameter arrays.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
return data_np, None
def parse_target(value, sample_count):
if not isinstance(value, list):
labels = [value]
elif not value:
return None, "Error: target must be non-empty"
elif all(not isinstance(item, list) for item in value):
labels = value
elif len(value) == 1:
labels = value[0]
elif all(isinstance(row, list) and len(row) == 1 for row in value):
labels = [row[0] for row in value]
else:
return None, "Error: target must be a single row or column"
if len(labels) != sample_count:
return None, "Error: target length must match sample count"
parsed = []
classes = []
for item in labels:
item = py(item)
if isinstance(item, str):
if not item.strip():
return None, "Error: target labels must not be blank"
elif isinstance(item, bool):
item = bool(item)
elif isinstance(item, (int, float)) and not isinstance(item, bool):
if not np.isfinite(float(item)):
return None, "Error: target labels must be finite"
item = float(item) if isinstance(item, float) else int(item)
else:
return None, "Error: target labels must be scalar string, boolean, or numeric values"
parsed.append(item)
if not any(type(existing) is type(item) and existing == item for existing in classes):
classes.append(item)
if len(classes) < 2:
return None, "Error: target must contain at least 2 classes"
return parsed, None
def count_table(predictions, classes):
rows = [[{"type": "String", "basicValue": "class"}, {"type": "String", "basicValue": "count"}]]
for class_label in classes:
count = sum(type(prediction) is type(class_label) and prediction == class_label for prediction in predictions)
rows.append([cell(class_label), {"type": "Double", "basicValue": float(count)}])
return rows
try:
data_np, error = parse_data(data)
if error:
return error
target_values, error = parse_target(target, data_np.shape[0])
if error:
return error
if float(var_smoothing) <= 0:
return "Error: var_smoothing must be greater than 0"
fitted = SklearnGaussianNB(var_smoothing=float(var_smoothing)).fit(data_np, target_values)
prediction_array = fitted.predict(data_np)
predictions = [py(item) for item in prediction_array.tolist()]
classes = [py(item) for item in fitted.classes_.tolist()]
accuracy = float(np.mean([
type(prediction) is type(actual) and prediction == actual
for prediction, actual in zip(predictions, target_values)
]))
return {
"type": "Double",
"basicValue": accuracy,
"properties": {
"accuracy": {"type": "Double", "basicValue": accuracy},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"class_count": {"type": "Double", "basicValue": float(len(classes))},
"classes": {"type": "Array", "elements": col(classes)},
"predictions": {"type": "Array", "elements": col(predictions)},
"prediction_counts": {"type": "Array", "elements": count_table(predictions, classes)},
"probabilities": {"type": "Array", "elements": mat(fitted.predict_proba(data_np).tolist())},
"class_priors": {"type": "Array", "elements": col(fitted.class_prior_.tolist())},
"theta": {"type": "Array", "elements": mat(fitted.theta_.tolist())},
"variances": {"type": "Array", "elements": mat(fitted.var_.tolist())}
}
}
except Exception as e:
return f"Error: {str(e)}"Online Calculator
GB_CLASSIFY
Gradient boosting classification builds a sequence of shallow decision trees that iteratively correct earlier mistakes. At each stage m, a new weak learner h_m(x) is added to the model to minimize the loss function:
F_{m}(x) = F_{m-1}(x) + \gamma_m h_m(x)
This flexible nonlinear classifier is well-suited for tabular data and exposes feature-importance estimates based on how often features are used to split nodes across the ensemble.
This wrapper accepts rows as samples and a target supplied as a single row or single column. It returns training accuracy together with predicted labels, class counts, class probabilities, and fitted feature importances.
Excel Usage
=GB_CLASSIFY(data, target, n_estimators, learning_rate, max_depth, subsample, random_state)
data(list[list], required): 2D array of numeric feature data with rows as samples and columns as features.target(list[list], required): Target labels as a single row, single column, or scalar when only one sample is present.n_estimators(int, optional, default: 100): Number of boosting stages to fit.learning_rate(float, optional, default: 0.1): Shrinkage factor applied to each boosting stage.max_depth(int, optional, default: 3): Maximum depth of each individual regression tree.subsample(float, optional, default: 1): Fraction of samples used to fit each boosting stage.random_state(int, optional, default: null): Integer seed for reproducible boosting and tree construction. Leave blank for the estimator default.
Returns (dict): Excel data type containing training accuracy, predictions, probabilities, and fitted feature importances.
Example 1: Fit a gradient boosting classifier for two string-labeled groups
Inputs:
| data | target | n_estimators | learning_rate | max_depth | subsample | random_state | |
|---|---|---|---|---|---|---|---|
| 0 | 0 | cold | 50 | 0.1 | 2 | 1 | 0 |
| 0 | 1 | cold | |||||
| 1 | 0 | cold | |||||
| 2 | 2 | hot | |||||
| 2 | 3 | hot | |||||
| 3 | 2 | hot |
Excel formula:
=GB_CLASSIFY({0,0;0,1;1,0;2,2;2,3;3,2}, {"cold";"cold";"cold";"hot";"hot";"hot"}, 50, 0.1, 2, 1, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"hot"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"hot"}],[{"type":"String","basicValue":"hot"}],[{"type":"String","basicValue":"hot"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"cold"},{"type":"Double","basicValue":3}],[{"type":"String","basicValue":"hot"},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0.996749},{"type":"Double","basicValue":0.00325104}],[{"type":"Double","basicValue":0.996749},{"type":"Double","basicValue":0.00325104}],[{"type":"Double","basicValue":0.996749},{"type":"Double","basicValue":0.00325104}],[{"type":"Double","basicValue":0.00325104},{"type":"Double","basicValue":0.996749}],[{"type":"Double","basicValue":0.00325104},{"type":"Double","basicValue":0.996749}],[{"type":"Double","basicValue":0.00325104},{"type":"Double","basicValue":0.996749}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.293415}],[{"type":"Double","basicValue":0.706585}]]},"estimator_count":{"type":"Double","basicValue":50}}}
Example 2: Fit gradient boosting for one-dimensional numeric labels
Inputs:
| data | target | n_estimators | learning_rate | max_depth | subsample | random_state |
|---|---|---|---|---|---|---|
| 0 | 0 | 50 | 0.1 | 2 | 1 | 0 |
| 0.2 | 0 | |||||
| 0.4 | 0 | |||||
| 1.2 | 1 | |||||
| 1.4 | 1 | |||||
| 1.6 | 1 |
Excel formula:
=GB_CLASSIFY({0;0.2;0.4;1.2;1.4;1.6}, {0;0;0;1;1;1}, 50, 0.1, 2, 1, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}]]},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0.996749},{"type":"Double","basicValue":0.00325104}],[{"type":"Double","basicValue":0.996749},{"type":"Double","basicValue":0.00325104}],[{"type":"Double","basicValue":0.996749},{"type":"Double","basicValue":0.00325104}],[{"type":"Double","basicValue":0.00325104},{"type":"Double","basicValue":0.996749}],[{"type":"Double","basicValue":0.00325104},{"type":"Double","basicValue":0.996749}],[{"type":"Double","basicValue":0.00325104},{"type":"Double","basicValue":0.996749}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"estimator_count":{"type":"Double","basicValue":50}}}
Example 3: Fit a gradient boosting classifier for three separated groups
Inputs:
| data | target | n_estimators | learning_rate | max_depth | subsample | random_state | |
|---|---|---|---|---|---|---|---|
| 0 | 0 | left | 50 | 0.1 | 2 | 1 | 0 |
| 0.2 | 0.1 | left | |||||
| 4 | 4 | center | |||||
| 4.2 | 3.9 | center | |||||
| 8 | 0 | right | |||||
| 8.2 | 0.1 | right |
Excel formula:
=GB_CLASSIFY({0,0;0.2,0.1;4,4;4.2,3.9;8,0;8.2,0.1}, {"left";"left";"center";"center";"right";"right"}, 50, 0.1, 2, 1, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":3},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"right"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"right"}],[{"type":"String","basicValue":"right"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"center"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"left"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"right"},{"type":"Double","basicValue":2}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0.000505015},{"type":"Double","basicValue":0.99899},{"type":"Double","basicValue":0.000505015}],[{"type":"Double","basicValue":0.000505015},{"type":"Double","basicValue":0.99899},{"type":"Double","basicValue":0.000505015}],[{"type":"Double","basicValue":0.99899},{"type":"Double","basicValue":0.000505015},{"type":"Double","basicValue":0.000505015}],[{"type":"Double","basicValue":0.99899},{"type":"Double","basicValue":0.000505015},{"type":"Double","basicValue":0.000505015}],[{"type":"Double","basicValue":0.000505015},{"type":"Double","basicValue":0.000505015},{"type":"Double","basicValue":0.99899}],[{"type":"Double","basicValue":0.000505015},{"type":"Double","basicValue":0.000505015},{"type":"Double","basicValue":0.99899}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.666667}],[{"type":"Double","basicValue":0.333333}]]},"estimator_count":{"type":"Double","basicValue":50}}}
Example 4: Flatten a single-row boolean target range for gradient boosting classification
Inputs:
| data | target | n_estimators | learning_rate | max_depth | subsample | random_state | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | false | false | false | true | true | true | 50 | 0.1 | 2 | 1 | 0 |
| 0.3 | |||||||||||
| 0.6 | |||||||||||
| 1.4 | |||||||||||
| 1.7 | |||||||||||
| 2 |
Excel formula:
=GB_CLASSIFY({0;0.3;0.6;1.4;1.7;2}, {FALSE,FALSE,FALSE,TRUE,TRUE,TRUE}, 50, 0.1, 2, 1, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"predictions":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Boolean","basicValue":false},{"type":"Double","basicValue":3}],[{"type":"Boolean","basicValue":true},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0.996749},{"type":"Double","basicValue":0.00325104}],[{"type":"Double","basicValue":0.996749},{"type":"Double","basicValue":0.00325104}],[{"type":"Double","basicValue":0.996749},{"type":"Double","basicValue":0.00325104}],[{"type":"Double","basicValue":0.00325104},{"type":"Double","basicValue":0.996749}],[{"type":"Double","basicValue":0.00325104},{"type":"Double","basicValue":0.996749}],[{"type":"Double","basicValue":0.00325104},{"type":"Double","basicValue":0.996749}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"estimator_count":{"type":"Double","basicValue":50}}}
Python Code
Show Code
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier as SklearnGradientBoostingClassifier
def gb_classify(data, target, n_estimators=100, learning_rate=0.1, max_depth=3, subsample=1, random_state=None):
"""
Fit a gradient boosting classifier and return training predictions.
See: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
target (list[list]): Target labels as a single row, single column, or scalar when only one sample is present.
n_estimators (int, optional): Number of boosting stages to fit. Default is 100.
learning_rate (float, optional): Shrinkage factor applied to each boosting stage. Default is 0.1.
max_depth (int, optional): Maximum depth of each individual regression tree. Default is 3.
subsample (float, optional): Fraction of samples used to fit each boosting stage. Default is 1.
random_state (int, optional): Integer seed for reproducible boosting and tree construction. Leave blank for the estimator default. Default is None.
Returns:
dict: Excel data type containing training accuracy, predictions, probabilities, and fitted feature importances.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
return data_np, None
def parse_target(value, sample_count):
if not isinstance(value, list):
labels = [value]
elif not value:
return None, "Error: target must be non-empty"
elif all(not isinstance(item, list) for item in value):
labels = value
elif len(value) == 1:
labels = value[0]
elif all(isinstance(row, list) and len(row) == 1 for row in value):
labels = [row[0] for row in value]
else:
return None, "Error: target must be a single row or column"
if len(labels) != sample_count:
return None, "Error: target length must match sample count"
parsed = []
classes = []
for item in labels:
item = py(item)
if isinstance(item, str):
if not item.strip():
return None, "Error: target labels must not be blank"
elif isinstance(item, bool):
item = bool(item)
elif isinstance(item, (int, float)) and not isinstance(item, bool):
if not np.isfinite(float(item)):
return None, "Error: target labels must be finite"
item = float(item) if isinstance(item, float) else int(item)
else:
return None, "Error: target labels must be scalar string, boolean, or numeric values"
parsed.append(item)
if not any(type(existing) is type(item) and existing == item for existing in classes):
classes.append(item)
if len(classes) < 2:
return None, "Error: target must contain at least 2 classes"
return parsed, None
def count_table(predictions, classes):
rows = [[{"type": "String", "basicValue": "class"}, {"type": "String", "basicValue": "count"}]]
for class_label in classes:
count = sum(type(prediction) is type(class_label) and prediction == class_label for prediction in predictions)
rows.append([cell(class_label), {"type": "Double", "basicValue": float(count)}])
return rows
try:
data_np, error = parse_data(data)
if error:
return error
target_values, error = parse_target(target, data_np.shape[0])
if error:
return error
if int(n_estimators) < 1:
return "Error: n_estimators must be at least 1"
if float(learning_rate) <= 0:
return "Error: learning_rate must be greater than 0"
if int(max_depth) < 1:
return "Error: max_depth must be at least 1"
if float(subsample) <= 0 or float(subsample) > 1:
return "Error: subsample must be greater than 0 and at most 1"
fitted = SklearnGradientBoostingClassifier(
n_estimators=int(n_estimators),
learning_rate=float(learning_rate),
max_depth=int(max_depth),
subsample=float(subsample),
random_state=None if random_state in (None, "") else int(random_state)
).fit(data_np, target_values)
prediction_array = fitted.predict(data_np)
predictions = [py(item) for item in prediction_array.tolist()]
classes = [py(item) for item in fitted.classes_.tolist()]
accuracy = float(np.mean([
type(prediction) is type(actual) and prediction == actual
for prediction, actual in zip(predictions, target_values)
]))
return {
"type": "Double",
"basicValue": accuracy,
"properties": {
"accuracy": {"type": "Double", "basicValue": accuracy},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"class_count": {"type": "Double", "basicValue": float(len(classes))},
"classes": {"type": "Array", "elements": col(classes)},
"predictions": {"type": "Array", "elements": col(predictions)},
"prediction_counts": {"type": "Array", "elements": count_table(predictions, classes)},
"probabilities": {"type": "Array", "elements": mat(fitted.predict_proba(data_np).tolist())},
"feature_importances": {"type": "Array", "elements": col(fitted.feature_importances_.tolist())},
"estimator_count": {"type": "Double", "basicValue": float(fitted.n_estimators_)}
}
}
except Exception as e:
return f"Error: {str(e)}"Online Calculator
KNN_CLASSIFY
K-nearest neighbors classification predicts the label of a sample based on the labels of its k closest neighbors in the training set. The distance between samples x and y is typically measured using the Minkowski distance:
d(x, y) = \left( \sum_{i=1}^n |x_i - y_i|^p \right)^{1/p}
When p=2, this corresponds to the standard Euclidean distance, and when p=1, it is the Manhattan distance. The function can also return class probabilities derived from the proportion of neighbors supporting each class.
This wrapper accepts rows as samples and a target supplied as a single row or single column. It returns training accuracy together with predicted labels, class counts, fitted class probabilities, and the resolved distance metric.
Excel Usage
=KNN_CLASSIFY(data, target, n_neighbors, knn_weights, knn_metric, p)
data(list[list], required): 2D array of numeric feature data with rows as samples and columns as features.target(list[list], required): Target labels as a single row, single column, or scalar when only one sample is present.n_neighbors(int, optional, default: 5): Number of nearest neighbors used for each vote.knn_weights(str, optional, default: “uniform”): Weighting scheme used when aggregating neighbor votes.knn_metric(str, optional, default: “minkowski”): Distance metric used to compare samples.p(int, optional, default: 2): Power parameter for the Minkowski metric.
Returns (dict): Excel data type containing training accuracy, predictions, probabilities, and k-nearest-neighbor summary properties.
Example 1: Classify two string-labeled groups with uniform neighbor votes
Inputs:
| data | target | n_neighbors | knn_weights | knn_metric | p |
|---|---|---|---|---|---|
| 0 | low | 3 | uniform | euclidean | 2 |
| 0.1 | low | ||||
| 0.2 | low | ||||
| 1.5 | high | ||||
| 1.6 | high | ||||
| 1.7 | high |
Excel formula:
=KNN_CLASSIFY({0;0.1;0.2;1.5;1.6;1.7}, {"low";"low";"low";"high";"high";"high"}, 3, "uniform", "euclidean", 2)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"high"}],[{"type":"String","basicValue":"low"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"low"}],[{"type":"String","basicValue":"low"}],[{"type":"String","basicValue":"low"}],[{"type":"String","basicValue":"high"}],[{"type":"String","basicValue":"high"}],[{"type":"String","basicValue":"high"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"high"},{"type":"Double","basicValue":3}],[{"type":"String","basicValue":"low"},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}]]},"neighbor_count":{"type":"Double","basicValue":3},"effective_metric":{"type":"String","basicValue":"euclidean"}}}
Example 2: Use distance weighting for numeric target labels
Inputs:
| data | target | n_neighbors | knn_weights | knn_metric | p | |
|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 3 | distance | euclidean | 2 |
| 0 | 0.2 | 0 | ||||
| 0.2 | 0 | 0 | ||||
| 2 | 2 | 1 | ||||
| 2.1 | 2 | 1 | ||||
| 2 | 2.1 | 1 |
Excel formula:
=KNN_CLASSIFY({0,0;0,0.2;0.2,0;2,2;2.1,2;2,2.1}, {0;0;0;1;1;1}, 3, "distance", "euclidean", 2)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}]]},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}]]},"neighbor_count":{"type":"Double","basicValue":3},"effective_metric":{"type":"String","basicValue":"euclidean"}}}
Example 3: Fit k-nearest neighbors for three groups with Manhattan distance
Inputs:
| data | target | n_neighbors | knn_weights | knn_metric | p | |
|---|---|---|---|---|---|---|
| 0 | 0 | left | 1 | uniform | manhattan | 1 |
| 0.1 | 0.2 | left | ||||
| 4 | 4 | center | ||||
| 4.1 | 3.9 | center | ||||
| 8 | 0 | right | ||||
| 8.1 | 0.2 | right |
Excel formula:
=KNN_CLASSIFY({0,0;0.1,0.2;4,4;4.1,3.9;8,0;8.1,0.2}, {"left";"left";"center";"center";"right";"right"}, 1, "uniform", "manhattan", 1)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":3},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"right"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"right"}],[{"type":"String","basicValue":"right"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"center"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"left"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"right"},{"type":"Double","basicValue":2}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}]]},"neighbor_count":{"type":"Double","basicValue":1},"effective_metric":{"type":"String","basicValue":"manhattan"}}}
Example 4: Flatten a single-row boolean target range for k-nearest neighbors
Inputs:
| data | target | n_neighbors | knn_weights | knn_metric | p | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | false | false | false | true | true | true | 1 | uniform | euclidean | 2 |
| 0.3 | ||||||||||
| 0.6 | ||||||||||
| 1.4 | ||||||||||
| 1.7 | ||||||||||
| 2 |
Excel formula:
=KNN_CLASSIFY({0;0.3;0.6;1.4;1.7;2}, {FALSE,FALSE,FALSE,TRUE,TRUE,TRUE}, 1, "uniform", "euclidean", 2)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"predictions":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Boolean","basicValue":false},{"type":"Double","basicValue":3}],[{"type":"Boolean","basicValue":true},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}]]},"neighbor_count":{"type":"Double","basicValue":1},"effective_metric":{"type":"String","basicValue":"euclidean"}}}
Python Code
Show Code
import numpy as np
from sklearn.neighbors import KNeighborsClassifier as SklearnKNeighborsClassifier
def knn_classify(data, target, n_neighbors=5, knn_weights='uniform', knn_metric='minkowski', p=2):
"""
Fit a k-nearest neighbors classifier and return training predictions.
See: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
target (list[list]): Target labels as a single row, single column, or scalar when only one sample is present.
n_neighbors (int, optional): Number of nearest neighbors used for each vote. Default is 5.
knn_weights (str, optional): Weighting scheme used when aggregating neighbor votes. Valid options: Uniform, Distance. Default is 'uniform'.
knn_metric (str, optional): Distance metric used to compare samples. Valid options: Minkowski, Euclidean, Manhattan. Default is 'minkowski'.
p (int, optional): Power parameter for the Minkowski metric. Default is 2.
Returns:
dict: Excel data type containing training accuracy, predictions, probabilities, and k-nearest-neighbor summary properties.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
return data_np, None
def parse_target(value, sample_count):
if not isinstance(value, list):
labels = [value]
elif not value:
return None, "Error: target must be non-empty"
elif all(not isinstance(item, list) for item in value):
labels = value
elif len(value) == 1:
labels = value[0]
elif all(isinstance(row, list) and len(row) == 1 for row in value):
labels = [row[0] for row in value]
else:
return None, "Error: target must be a single row or column"
if len(labels) != sample_count:
return None, "Error: target length must match sample count"
parsed = []
classes = []
for item in labels:
item = py(item)
if isinstance(item, str):
if not item.strip():
return None, "Error: target labels must not be blank"
elif isinstance(item, bool):
item = bool(item)
elif isinstance(item, (int, float)) and not isinstance(item, bool):
if not np.isfinite(float(item)):
return None, "Error: target labels must be finite"
item = float(item) if isinstance(item, float) else int(item)
else:
return None, "Error: target labels must be scalar string, boolean, or numeric values"
parsed.append(item)
if not any(type(existing) is type(item) and existing == item for existing in classes):
classes.append(item)
if len(classes) < 2:
return None, "Error: target must contain at least 2 classes"
return parsed, None
def count_table(predictions, classes):
rows = [[{"type": "String", "basicValue": "class"}, {"type": "String", "basicValue": "count"}]]
for class_label in classes:
count = sum(type(prediction) is type(class_label) and prediction == class_label for prediction in predictions)
rows.append([cell(class_label), {"type": "Double", "basicValue": float(count)}])
return rows
try:
data_np, error = parse_data(data)
if error:
return error
target_values, error = parse_target(target, data_np.shape[0])
if error:
return error
neighbor_total = int(n_neighbors)
if neighbor_total < 1:
return "Error: n_neighbors must be at least 1"
if neighbor_total > data_np.shape[0]:
return "Error: n_neighbors cannot exceed the number of samples"
weights_value = str(knn_weights).strip().lower()
if weights_value not in {"uniform", "distance"}:
return "Error: weights must be 'uniform' or 'distance'"
metric_value = str(knn_metric).strip().lower()
if metric_value not in {"minkowski", "euclidean", "manhattan"}:
return "Error: metric must be 'minkowski', 'euclidean', or 'manhattan'"
if int(p) < 1:
return "Error: p must be at least 1"
fitted = SklearnKNeighborsClassifier(
n_neighbors=neighbor_total,
weights=weights_value,
metric=metric_value,
p=int(p)
).fit(data_np, target_values)
prediction_array = fitted.predict(data_np)
predictions = [py(item) for item in prediction_array.tolist()]
classes = [py(item) for item in fitted.classes_.tolist()]
accuracy = float(np.mean([
type(prediction) is type(actual) and prediction == actual
for prediction, actual in zip(predictions, target_values)
]))
return {
"type": "Double",
"basicValue": accuracy,
"properties": {
"accuracy": {"type": "Double", "basicValue": accuracy},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"class_count": {"type": "Double", "basicValue": float(len(classes))},
"classes": {"type": "Array", "elements": col(classes)},
"predictions": {"type": "Array", "elements": col(predictions)},
"prediction_counts": {"type": "Array", "elements": count_table(predictions, classes)},
"probabilities": {"type": "Array", "elements": mat(fitted.predict_proba(data_np).tolist())},
"neighbor_count": {"type": "Double", "basicValue": float(neighbor_total)},
"effective_metric": {"type": "String", "basicValue": str(fitted.effective_metric_)}
}
}
except Exception as e:
return f"Error: {str(e)}"Online Calculator
LINEAR_SVC
Linear support vector classification fits a max-margin linear decision boundary. The decision function for a sample x is defined as:
f(x) = w^T x + b
The model parameters w and b are determined by minimizing a combination of the squared norm of w and the hinge loss across all training samples:
\min_{w, b} \frac{1}{2} \|w\|^2 + C \sum_{i=1}^n \max(0, 1 - y_i(w^T x_i + b))
This wrapper accepts rows as samples and a target supplied as a single row or single column. It returns training accuracy together with predicted labels, class counts, decision scores, and fitted coefficient arrays.
Excel Usage
=LINEAR_SVC(data, target, penalty, loss, C, max_iter, tol, fit_intercept, random_state)
data(list[list], required): 2D array of numeric feature data with rows as samples and columns as features.target(list[list], required): Target labels as a single row, single column, or scalar when only one sample is present.penalty(str, optional, default: “l2”): Norm used in the linear SVM penalty term.loss(str, optional, default: “squared_hinge”): Hinge-style loss function used during fitting.C(float, optional, default: 1): Inverse regularization strength. Smaller values apply stronger regularization.max_iter(int, optional, default: 1000): Maximum number of optimization iterations.tol(float, optional, default: 0.0001): Convergence tolerance for the optimizer.fit_intercept(bool, optional, default: true): Whether to include an intercept term in the linear decision function.random_state(int, optional, default: null): Integer seed used when the underlying solver shuffles data. Leave blank for the estimator default.
Returns (dict): Excel data type containing training accuracy, predictions, decision scores, and fitted coefficient arrays.
Example 1: Fit linear support vector classification for two string-labeled classes
Inputs:
| data | target | penalty | loss | C | max_iter | tol | fit_intercept | random_state | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | cold | l2 | squared_hinge | 1 | 4000 | 0.0001 | true | 0 |
| 0 | 1 | cold | |||||||
| 1 | 0 | cold | |||||||
| 2 | 2 | hot | |||||||
| 2 | 3 | hot | |||||||
| 3 | 2 | hot |
Excel formula:
=LINEAR_SVC({0,0;0,1;1,0;2,2;2,3;3,2}, {"cold";"cold";"cold";"hot";"hot";"hot"}, "l2", "squared_hinge", 1, 4000, 0.0001, TRUE, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"hot"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"hot"}],[{"type":"String","basicValue":"hot"}],[{"type":"String","basicValue":"hot"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"cold"},{"type":"Double","basicValue":3}],[{"type":"String","basicValue":"hot"},{"type":"Double","basicValue":3}]]},"decision_scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.01639}],[{"type":"Double","basicValue":-0.590164}],[{"type":"Double","basicValue":-0.590164}],[{"type":"Double","basicValue":0.688525}],[{"type":"Double","basicValue":1.11475}],[{"type":"Double","basicValue":1.11475}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":0.42623},{"type":"Double","basicValue":0.42623}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.01639}]]}}}
Example 2: Use hinge loss for one-dimensional numeric labels
Inputs:
| data | target | penalty | loss | C | max_iter | tol | fit_intercept | random_state |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | l2 | hinge | 1 | 4000 | 0.0001 | true | 0 |
| 0.2 | 0 | |||||||
| 0.4 | 0 | |||||||
| 1.2 | 1 | |||||||
| 1.4 | 1 | |||||||
| 1.6 | 1 |
Excel formula:
=LINEAR_SVC({0;0.2;0.4;1.2;1.4;1.6}, {0;0;0;1;1;1}, "l2", "hinge", 1, 4000, 0.0001, TRUE, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}]]},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"decision_scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.00001}],[{"type":"Double","basicValue":-0.71429}],[{"type":"Double","basicValue":-0.428575}],[{"type":"Double","basicValue":0.714285}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1.28572}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":1.42858}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.00001}]]}}}
Example 3: Fit linear support vector classification for three groups
Inputs:
| data | target | penalty | loss | C | max_iter | tol | fit_intercept | random_state | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | left | l2 | squared_hinge | 1 | 4000 | 0.0001 | true | 0 |
| 0.2 | 0.1 | left | |||||||
| 4 | 4 | center | |||||||
| 4.2 | 3.9 | center | |||||||
| 8 | 0 | right | |||||||
| 8.2 | 0.1 | right |
Excel formula:
=LINEAR_SVC({0,0;0.2,0.1;4,4;4.2,3.9;8,0;8.2,0.1}, {"left";"left";"center";"center";"right";"right"}, "l2", "squared_hinge", 1, 4000, 0.0001, TRUE, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":3},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"right"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"right"}],[{"type":"String","basicValue":"right"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"center"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"left"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"right"},{"type":"Double","basicValue":2}]]},"decision_scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.801853},{"type":"Double","basicValue":0.811928},{"type":"Double","basicValue":-0.804821}],[{"type":"Double","basicValue":-0.75962},{"type":"Double","basicValue":0.744738},{"type":"Double","basicValue":-0.788505}],[{"type":"Double","basicValue":0.992554},{"type":"Double","basicValue":-0.971388},{"type":"Double","basicValue":-1.04172}],[{"type":"Double","basicValue":0.939812},{"type":"Double","basicValue":-0.994626},{"type":"Double","basicValue":-0.969078}],[{"type":"Double","basicValue":-1.01204},{"type":"Double","basicValue":-0.996617},{"type":"Double","basicValue":0.97425}],[{"type":"Double","basicValue":-0.969803},{"type":"Double","basicValue":-1.06381},{"type":"Double","basicValue":0.990566}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.0262729},{"type":"Double","basicValue":0.474875}],[{"type":"Double","basicValue":-0.226068},{"type":"Double","basicValue":-0.219761}],[{"type":"Double","basicValue":0.222384},{"type":"Double","basicValue":-0.281607}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.801853}],[{"type":"Double","basicValue":0.811928}],[{"type":"Double","basicValue":-0.804821}]]}}}
Example 4: Flatten a single-row boolean target range for linear support vector classification
Inputs:
| data | target | penalty | loss | C | max_iter | tol | fit_intercept | random_state | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | false | false | false | true | true | true | l2 | squared_hinge | 1 | 4000 | 0.0001 | true | 0 |
| 0.3 | |||||||||||||
| 0.6 | |||||||||||||
| 1.4 | |||||||||||||
| 1.7 | |||||||||||||
| 2 |
Excel formula:
=LINEAR_SVC({0;0.3;0.6;1.4;1.7;2}, {FALSE,FALSE,FALSE,TRUE,TRUE,TRUE}, "l2", "squared_hinge", 1, 4000, 0.0001, TRUE, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"predictions":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Boolean","basicValue":false},{"type":"Double","basicValue":3}],[{"type":"Boolean","basicValue":true},{"type":"Double","basicValue":3}]]},"decision_scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.918239}],[{"type":"Double","basicValue":-0.614465}],[{"type":"Double","basicValue":-0.310692}],[{"type":"Double","basicValue":0.499371}],[{"type":"Double","basicValue":0.803145}],[{"type":"Double","basicValue":1.10692}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":1.01258}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.918239}]]}}}
Python Code
Show Code
import numpy as np
from sklearn.svm import LinearSVC as SklearnLinearSVC
def linear_svc(data, target, penalty='l2', loss='squared_hinge', C=1, max_iter=1000, tol=0.0001, fit_intercept=True, random_state=None):
"""
Fit a linear support vector classifier and return training predictions.
See: https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
target (list[list]): Target labels as a single row, single column, or scalar when only one sample is present.
penalty (str, optional): Norm used in the linear SVM penalty term. Valid options: L2, L1. Default is 'l2'.
loss (str, optional): Hinge-style loss function used during fitting. Valid options: Squared Hinge, Hinge. Default is 'squared_hinge'.
C (float, optional): Inverse regularization strength. Smaller values apply stronger regularization. Default is 1.
max_iter (int, optional): Maximum number of optimization iterations. Default is 1000.
tol (float, optional): Convergence tolerance for the optimizer. Default is 0.0001.
fit_intercept (bool, optional): Whether to include an intercept term in the linear decision function. Default is True.
random_state (int, optional): Integer seed used when the underlying solver shuffles data. Leave blank for the estimator default. Default is None.
Returns:
dict: Excel data type containing training accuracy, predictions, decision scores, and fitted coefficient arrays.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
return data_np, None
def parse_target(value, sample_count):
if not isinstance(value, list):
labels = [value]
elif not value:
return None, "Error: target must be non-empty"
elif all(not isinstance(item, list) for item in value):
labels = value
elif len(value) == 1:
labels = value[0]
elif all(isinstance(row, list) and len(row) == 1 for row in value):
labels = [row[0] for row in value]
else:
return None, "Error: target must be a single row or column"
if len(labels) != sample_count:
return None, "Error: target length must match sample count"
parsed = []
classes = []
for item in labels:
item = py(item)
if isinstance(item, str):
if not item.strip():
return None, "Error: target labels must not be blank"
elif isinstance(item, bool):
item = bool(item)
elif isinstance(item, (int, float)) and not isinstance(item, bool):
if not np.isfinite(float(item)):
return None, "Error: target labels must be finite"
item = float(item) if isinstance(item, float) else int(item)
else:
return None, "Error: target labels must be strings, booleans, or numbers"
parsed.append(item)
if not any(type(existing) is type(item) and existing == item for existing in classes):
classes.append(item)
if len(classes) < 2:
return None, "Error: target must contain at least 2 classes"
return parsed, None
def count_table(predictions, classes):
rows = [[{"type": "String", "basicValue": "class"}, {"type": "String", "basicValue": "count"}]]
for class_label in classes:
rows.append([cell(class_label), {"type": "Double", "basicValue": float(sum(type(prediction) is type(class_label) and prediction == class_label for prediction in predictions))}])
return rows
try:
data_np, error = parse_data(data)
if error:
return error
target_values, error = parse_target(target, data_np.shape[0])
if error:
return error
penalty_value = str(penalty).strip().lower()
if penalty_value not in {"l1", "l2"}:
return "Error: penalty must be 'l1' or 'l2'"
loss_value = str(loss).strip().lower()
if loss_value not in {"hinge", "squared_hinge"}:
return "Error: loss must be 'hinge' or 'squared_hinge'"
if penalty_value == "l1" and loss_value == "hinge":
return "Error: penalty 'l1' cannot be combined with loss 'hinge'"
if float(C) <= 0:
return "Error: C must be greater than 0"
if int(max_iter) < 1:
return "Error: max_iter must be at least 1"
if float(tol) <= 0:
return "Error: tol must be greater than 0"
fitted = SklearnLinearSVC(
penalty=penalty_value,
loss=loss_value,
C=float(C),
max_iter=int(max_iter),
tol=float(tol),
fit_intercept=bool(fit_intercept),
random_state=None if random_state in (None, "") else int(random_state),
dual="auto"
).fit(data_np, target_values)
prediction_array = fitted.predict(data_np)
predictions = [py(item) for item in prediction_array.tolist()]
classes = [py(item) for item in fitted.classes_.tolist()]
scores = np.asarray(fitted.decision_function(data_np))
score_rows = [[float(value)] for value in scores.tolist()] if scores.ndim == 1 else scores.tolist()
accuracy = float(np.mean([type(prediction) is type(actual) and prediction == actual for prediction, actual in zip(predictions, target_values)]))
return {
"type": "Double",
"basicValue": accuracy,
"properties": {
"accuracy": {"type": "Double", "basicValue": accuracy},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"class_count": {"type": "Double", "basicValue": float(len(classes))},
"classes": {"type": "Array", "elements": col(classes)},
"predictions": {"type": "Array", "elements": col(predictions)},
"prediction_counts": {"type": "Array", "elements": count_table(predictions, classes)},
"decision_scores": {"type": "Array", "elements": mat(score_rows)},
"coefficients": {"type": "Array", "elements": mat(np.atleast_2d(fitted.coef_).tolist())},
"intercepts": {"type": "Array", "elements": col(np.atleast_1d(fitted.intercept_).tolist())}
}
}
except Exception as e:
return f"Error: {str(e)}"Online Calculator
LOGISTIC_CLS
Logistic regression is a linear classification method that models class membership probabilities from a weighted combination of the input features. For a binary classification problem, the probability of the positive class is modeled using the sigmoid function:
P(y=1 \mid x) = \frac{1}{1 + \exp(-(w^T x + b))}
Regularization (L1 or L2) helps stabilize the fitted coefficients and reduce overfitting on small or collinear datasets. The model parameters are estimated by minimizing the cross-entropy loss.
This wrapper accepts tabular feature data with rows as samples and columns as features, plus a target supplied as a single row or single column. It returns the training accuracy together with the learned classes, fitted predictions, class counts, class probabilities, and fitted coefficient arrays.
Excel Usage
=LOGISTIC_CLS(data, target, penalty, C, solver, max_iter, fit_intercept, random_state)
data(list[list], required): 2D array of numeric feature data with rows as samples and columns as features.target(list[list], required): Target labels as a single row, single column, or scalar when only one sample is present.penalty(str, optional, default: “l2”): Regularization penalty applied to the logistic model.C(float, optional, default: 1): Inverse regularization strength. Smaller values apply stronger regularization.solver(str, optional, default: “lbfgs”): Optimization algorithm used to fit the classifier.max_iter(int, optional, default: 200): Maximum number of solver iterations.fit_intercept(bool, optional, default: true): Whether to include an intercept term in the linear decision function.random_state(int, optional, default: null): Integer seed for solvers that use randomness. Leave blank for the estimator default.
Returns (dict): Excel data type containing training accuracy, predictions, probabilities, and fitted coefficient arrays.
Example 1: Fit logistic regression for two string-labeled classes
Inputs:
| data | target | penalty | C | solver | max_iter | fit_intercept | random_state | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | cold | l2 | 1 | lbfgs | 200 | true | 0 |
| 0 | 1 | cold | ||||||
| 1 | 0 | cold | ||||||
| 1 | 1 | hot | ||||||
| 2 | 1 | hot | ||||||
| 2 | 2 | hot |
Excel formula:
=LOGISTIC_CLS({0,0;0,1;1,0;1,1;2,1;2,2}, {"cold";"cold";"cold";"hot";"hot";"hot"}, "l2", 1, "lbfgs", 200, TRUE, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"hot"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"hot"}],[{"type":"String","basicValue":"hot"}],[{"type":"String","basicValue":"hot"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"cold"},{"type":"Double","basicValue":3}],[{"type":"String","basicValue":"hot"},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0.812399},{"type":"Double","basicValue":0.187601}],[{"type":"Double","basicValue":0.682647},{"type":"Double","basicValue":0.317353}],[{"type":"Double","basicValue":0.635306},{"type":"Double","basicValue":0.364694}],[{"type":"Double","basicValue":0.463897},{"type":"Double","basicValue":0.536103}],[{"type":"Double","basicValue":0.258211},{"type":"Double","basicValue":0.741789}],[{"type":"Double","basicValue":0.147418},{"type":"Double","basicValue":0.852582}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":0.910626},{"type":"Double","basicValue":0.699711}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.46567}]]}}}
Example 2: Classify one-dimensional samples with numeric labels
Inputs:
| data | target | penalty | C | solver | max_iter | fit_intercept | random_state |
|---|---|---|---|---|---|---|---|
| 0 | 0 | l2 | 1 | lbfgs | 200 | true | 0 |
| 0.2 | 0 | ||||||
| 0.4 | 0 | ||||||
| 1.2 | 1 | ||||||
| 1.4 | 1 | ||||||
| 1.6 | 1 |
Excel formula:
=LOGISTIC_CLS({0;0.2;0.4;1.2;1.4;1.6}, {0;0;0;1;1;1}, "l2", 1, "lbfgs", 200, TRUE, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}]]},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0.716801},{"type":"Double","basicValue":0.283199}],[{"type":"Double","basicValue":0.667409},{"type":"Double","basicValue":0.332591}],[{"type":"Double","basicValue":0.61404},{"type":"Double","basicValue":0.38596}],[{"type":"Double","basicValue":0.38596},{"type":"Double","basicValue":0.61404}],[{"type":"Double","basicValue":0.332591},{"type":"Double","basicValue":0.667409}],[{"type":"Double","basicValue":0.283199},{"type":"Double","basicValue":0.716801}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":1.16081}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.92865}]]}}}
Example 3: Fit a three-class logistic model on separated groups
Inputs:
| data | target | penalty | C | solver | max_iter | fit_intercept | random_state | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | left | l2 | 1 | lbfgs | 200 | true | 0 |
| 0.2 | 0.1 | left | ||||||
| 4 | 4 | center | ||||||
| 4.2 | 3.9 | center | ||||||
| 8 | 0 | right | ||||||
| 8.2 | 0.1 | right |
Excel formula:
=LOGISTIC_CLS({0,0;0.2,0.1;4,4;4.2,3.9;8,0;8.2,0.1}, {"left";"left";"center";"center";"right";"right"}, "l2", 1, "lbfgs", 200, TRUE, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":3},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"right"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"right"}],[{"type":"String","basicValue":"right"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"center"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"left"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"right"},{"type":"Double","basicValue":2}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0.0391472},{"type":"Double","basicValue":0.949817},{"type":"Double","basicValue":0.0110358}],[{"type":"Double","basicValue":0.0477367},{"type":"Double","basicValue":0.938711},{"type":"Double","basicValue":0.013552}],[{"type":"Double","basicValue":0.917197},{"type":"Double","basicValue":0.0437528},{"type":"Double","basicValue":0.0390499}],[{"type":"Double","basicValue":0.909183},{"type":"Double","basicValue":0.0430583},{"type":"Double","basicValue":0.0477584}],[{"type":"Double","basicValue":0.043504},{"type":"Double","basicValue":0.0136633},{"type":"Double","basicValue":0.942833}],[{"type":"Double","basicValue":0.0433284},{"type":"Double","basicValue":0.0110291},{"type":"Double","basicValue":0.945642}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":0.000202248},{"type":"Double","basicValue":0.676631}],[{"type":"Double","basicValue":-0.543183},{"type":"Double","basicValue":-0.337911}],[{"type":"Double","basicValue":0.54298},{"type":"Double","basicValue":-0.33872}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.640917}],[{"type":"Double","basicValue":2.54802}],[{"type":"Double","basicValue":-1.90711}]]}}}
Example 4: Flatten a single-row boolean target range
Inputs:
| data | target | penalty | C | solver | max_iter | fit_intercept | random_state | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | false | false | false | true | true | true | l2 | 1 | lbfgs | 200 | true | 0 |
| 0.3 | ||||||||||||
| 0.6 | ||||||||||||
| 1.4 | ||||||||||||
| 1.7 | ||||||||||||
| 2 |
Excel formula:
=LOGISTIC_CLS({0;0.3;0.6;1.4;1.7;2}, {FALSE,FALSE,FALSE,TRUE,TRUE,TRUE}, "l2", 1, "lbfgs", 200, TRUE, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"predictions":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Boolean","basicValue":false},{"type":"Double","basicValue":3}],[{"type":"Boolean","basicValue":true},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0.7675},{"type":"Double","basicValue":0.2325}],[{"type":"Double","basicValue":0.697605},{"type":"Double","basicValue":0.302395}],[{"type":"Double","basicValue":0.617178},{"type":"Double","basicValue":0.382822}],[{"type":"Double","basicValue":0.382734},{"type":"Double","basicValue":0.617266}],[{"type":"Double","basicValue":0.302317},{"type":"Double","basicValue":0.697683}],[{"type":"Double","basicValue":0.232434},{"type":"Double","basicValue":0.767566}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":1.19443}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.19425}]]}}}
Python Code
Show Code
import numpy as np
from sklearn.linear_model import LogisticRegression as SklearnLogisticRegression
def logistic_cls(data, target, penalty='l2', C=1, solver='lbfgs', max_iter=200, fit_intercept=True, random_state=None):
"""
Fit a regularized logistic regression classifier and return training predictions.
See: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
target (list[list]): Target labels as a single row, single column, or scalar when only one sample is present.
penalty (str, optional): Regularization penalty applied to the logistic model. Valid options: L2, L1. Default is 'l2'.
C (float, optional): Inverse regularization strength. Smaller values apply stronger regularization. Default is 1.
solver (str, optional): Optimization algorithm used to fit the classifier. Valid options: LBFGS, Liblinear, SAGA. Default is 'lbfgs'.
max_iter (int, optional): Maximum number of solver iterations. Default is 200.
fit_intercept (bool, optional): Whether to include an intercept term in the linear decision function. Default is True.
random_state (int, optional): Integer seed for solvers that use randomness. Leave blank for the estimator default. Default is None.
Returns:
dict: Excel data type containing training accuracy, predictions, probabilities, and fitted coefficient arrays.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
return data_np, None
def parse_target(value, sample_count):
if not isinstance(value, list):
labels = [value]
elif not value:
return None, "Error: target must be non-empty"
elif all(not isinstance(item, list) for item in value):
labels = value
elif len(value) == 1:
labels = value[0]
elif all(isinstance(row, list) and len(row) == 1 for row in value):
labels = [row[0] for row in value]
else:
return None, "Error: target must be a single row or column"
if len(labels) != sample_count:
return None, "Error: target length must match sample count"
parsed = []
classes = []
for item in labels:
item = py(item)
if isinstance(item, str):
if not item.strip():
return None, "Error: target labels must not be blank"
elif isinstance(item, bool):
item = bool(item)
elif isinstance(item, (int, float)) and not isinstance(item, bool):
if not np.isfinite(float(item)):
return None, "Error: target labels must be finite"
item = float(item) if isinstance(item, float) else int(item)
else:
return None, "Error: target labels must be scalar string, boolean, or numeric values"
parsed.append(item)
if not any(type(existing) is type(item) and existing == item for existing in classes):
classes.append(item)
if len(classes) < 2:
return None, "Error: target must contain at least 2 classes"
return parsed, None
def count_table(predictions, classes):
rows = [[{"type": "String", "basicValue": "class"}, {"type": "String", "basicValue": "count"}]]
for class_label in classes:
count = sum(type(prediction) is type(class_label) and prediction == class_label for prediction in predictions)
rows.append([cell(class_label), {"type": "Double", "basicValue": float(count)}])
return rows
try:
data_np, error = parse_data(data)
if error:
return error
target_values, error = parse_target(target, data_np.shape[0])
if error:
return error
penalty_value = str(penalty).strip().lower()
if penalty_value not in {"l1", "l2"}:
return "Error: penalty must be 'l1' or 'l2'"
if float(C) <= 0:
return "Error: C must be greater than 0"
solver_value = str(solver).strip().lower()
if solver_value not in {"lbfgs", "liblinear", "saga"}:
return "Error: solver must be 'lbfgs', 'liblinear', or 'saga'"
if penalty_value == "l1" and solver_value not in {"liblinear", "saga"}:
return "Error: solver must be 'liblinear' or 'saga' when penalty is 'l1'"
if int(max_iter) < 1:
return "Error: max_iter must be at least 1"
fitted = SklearnLogisticRegression(
penalty=penalty_value,
C=float(C),
solver=solver_value,
max_iter=int(max_iter),
fit_intercept=bool(fit_intercept),
random_state=None if random_state in (None, "") else int(random_state)
).fit(data_np, target_values)
prediction_array = fitted.predict(data_np)
predictions = [py(item) for item in prediction_array.tolist()]
classes = [py(item) for item in fitted.classes_.tolist()]
accuracy = float(np.mean([
type(prediction) is type(actual) and prediction == actual
for prediction, actual in zip(predictions, target_values)
]))
return {
"type": "Double",
"basicValue": accuracy,
"properties": {
"accuracy": {"type": "Double", "basicValue": accuracy},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"class_count": {"type": "Double", "basicValue": float(len(classes))},
"classes": {"type": "Array", "elements": col(classes)},
"predictions": {"type": "Array", "elements": col(predictions)},
"prediction_counts": {"type": "Array", "elements": count_table(predictions, classes)},
"probabilities": {"type": "Array", "elements": mat(fitted.predict_proba(data_np).tolist())},
"coefficients": {"type": "Array", "elements": mat(fitted.coef_.tolist())},
"intercepts": {"type": "Array", "elements": col(fitted.intercept_.tolist())}
}
}
except Exception as e:
return f"Error: {str(e)}"Online Calculator
RF_CLASSIFY
Random forest classification averages the predictions of many decision trees trained on resampled data (bootstrapping) and feature subsets. For a classification problem with B trees, the ensemble prediction \hat{y} is determined by majority vote:
\hat{y} = \text{mode}\{T_1(x), T_2(x), \dots, T_B(x)\}
This approach reduces model variance without significantly increasing bias, making it a strong default for nonlinear tabular classification. It also exposes feature-importance estimates based on impurity reduction at split nodes.
This wrapper accepts rows as samples and a target supplied as a single row or single column. It returns training accuracy together with predicted labels, class counts, class probabilities, and fitted feature importances.
Excel Usage
=RF_CLASSIFY(data, target, n_estimators, rf_criterion, max_depth, min_samples_leaf, random_state)
data(list[list], required): 2D array of numeric feature data with rows as samples and columns as features.target(list[list], required): Target labels as a single row, single column, or scalar when only one sample is present.n_estimators(int, optional, default: 100): Number of trees in the forest.rf_criterion(str, optional, default: “gini”): Split quality measure used by each decision tree.max_depth(int, optional, default: null): Maximum depth of each tree. Leave blank for unconstrained depth.min_samples_leaf(int, optional, default: 1): Minimum number of samples required in each leaf.random_state(int, optional, default: null): Integer seed for reproducible tree sampling. Leave blank for the estimator default.
Returns (dict): Excel data type containing training accuracy, predictions, probabilities, and fitted feature importances.
Example 1: Fit a random forest classifier for two string-labeled groups
Inputs:
| data | target | n_estimators | rf_criterion | max_depth | min_samples_leaf | random_state | |
|---|---|---|---|---|---|---|---|
| 0 | 0 | cold | 25 | gini | 3 | 1 | 0 |
| 0 | 1 | cold | |||||
| 1 | 0 | cold | |||||
| 2 | 2 | hot | |||||
| 2 | 3 | hot | |||||
| 3 | 2 | hot |
Excel formula:
=RF_CLASSIFY({0,0;0,1;1,0;2,2;2,3;3,2}, {"cold";"cold";"cold";"hot";"hot";"hot"}, 25, "gini", 3, 1, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"hot"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"hot"}],[{"type":"String","basicValue":"hot"}],[{"type":"String","basicValue":"hot"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"cold"},{"type":"Double","basicValue":3}],[{"type":"String","basicValue":"hot"},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0.04},{"type":"Double","basicValue":0.96}],[{"type":"Double","basicValue":0.04},{"type":"Double","basicValue":0.96}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.48}],[{"type":"Double","basicValue":0.52}]]},"estimator_count":{"type":"Double","basicValue":25}}}
Example 2: Use entropy splits for one-dimensional numeric labels
Inputs:
| data | target | n_estimators | rf_criterion | max_depth | min_samples_leaf | random_state |
|---|---|---|---|---|---|---|
| 0 | 0 | 25 | entropy | 3 | 1 | 0 |
| 0.2 | 0 | |||||
| 0.4 | 0 | |||||
| 1.2 | 1 | |||||
| 1.4 | 1 | |||||
| 1.6 | 1 |
Excel formula:
=RF_CLASSIFY({0;0.2;0.4;1.2;1.4;1.6}, {0;0;0;1;1;1}, 25, "entropy", 3, 1, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}]]},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"estimator_count":{"type":"Double","basicValue":25}}}
Example 3: Fit a random forest classifier for three separated groups
Inputs:
| data | target | n_estimators | rf_criterion | max_depth | min_samples_leaf | random_state | |
|---|---|---|---|---|---|---|---|
| 0 | 0 | left | 25 | gini | 3 | 1 | 0 |
| 0.2 | 0.1 | left | |||||
| 4 | 4 | center | |||||
| 4.2 | 3.9 | center | |||||
| 8 | 0 | right | |||||
| 8.2 | 0.1 | right |
Excel formula:
=RF_CLASSIFY({0,0;0.2,0.1;4,4;4.2,3.9;8,0;8.2,0.1}, {"left";"left";"center";"center";"right";"right"}, 25, "gini", 3, 1, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":3},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"right"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"right"}],[{"type":"String","basicValue":"right"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"center"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"left"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"right"},{"type":"Double","basicValue":2}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0.04},{"type":"Double","basicValue":0.92},{"type":"Double","basicValue":0.04}],[{"type":"Double","basicValue":0.04},{"type":"Double","basicValue":0.88},{"type":"Double","basicValue":0.08}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":0.04},{"type":"Double","basicValue":0.96}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":0.12},{"type":"Double","basicValue":0.88}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.603273}],[{"type":"Double","basicValue":0.396727}]]},"estimator_count":{"type":"Double","basicValue":25}}}
Example 4: Flatten a single-row boolean target range for random forest classification
Inputs:
| data | target | n_estimators | rf_criterion | max_depth | min_samples_leaf | random_state | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | false | false | false | true | true | true | 25 | gini | 3 | 1 | 0 |
| 0.3 | |||||||||||
| 0.6 | |||||||||||
| 1.4 | |||||||||||
| 1.7 | |||||||||||
| 2 |
Excel formula:
=RF_CLASSIFY({0;0.3;0.6;1.4;1.7;2}, {FALSE,FALSE,FALSE,TRUE,TRUE,TRUE}, 25, "gini", 3, 1, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"predictions":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Boolean","basicValue":false},{"type":"Double","basicValue":3}],[{"type":"Boolean","basicValue":true},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}]]},"feature_importances":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"estimator_count":{"type":"Double","basicValue":25}}}
Python Code
Show Code
import numpy as np
from sklearn.ensemble import RandomForestClassifier as SklearnRandomForestClassifier
def rf_classify(data, target, n_estimators=100, rf_criterion='gini', max_depth=None, min_samples_leaf=1, random_state=None):
"""
Fit a random forest classifier and return training predictions.
See: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
target (list[list]): Target labels as a single row, single column, or scalar when only one sample is present.
n_estimators (int, optional): Number of trees in the forest. Default is 100.
rf_criterion (str, optional): Split quality measure used by each decision tree. Valid options: Gini, Entropy, Log Loss. Default is 'gini'.
max_depth (int, optional): Maximum depth of each tree. Leave blank for unconstrained depth. Default is None.
min_samples_leaf (int, optional): Minimum number of samples required in each leaf. Default is 1.
random_state (int, optional): Integer seed for reproducible tree sampling. Leave blank for the estimator default. Default is None.
Returns:
dict: Excel data type containing training accuracy, predictions, probabilities, and fitted feature importances.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
return data_np, None
def parse_target(value, sample_count):
if not isinstance(value, list):
labels = [value]
elif not value:
return None, "Error: target must be non-empty"
elif all(not isinstance(item, list) for item in value):
labels = value
elif len(value) == 1:
labels = value[0]
elif all(isinstance(row, list) and len(row) == 1 for row in value):
labels = [row[0] for row in value]
else:
return None, "Error: target must be a single row or column"
if len(labels) != sample_count:
return None, "Error: target length must match sample count"
parsed = []
classes = []
for item in labels:
item = py(item)
if isinstance(item, str):
if not item.strip():
return None, "Error: target labels must not be blank"
elif isinstance(item, bool):
item = bool(item)
elif isinstance(item, (int, float)) and not isinstance(item, bool):
if not np.isfinite(float(item)):
return None, "Error: target labels must be finite"
item = float(item) if isinstance(item, float) else int(item)
else:
return None, "Error: target labels must be scalar string, boolean, or numeric values"
parsed.append(item)
if not any(type(existing) is type(item) and existing == item for existing in classes):
classes.append(item)
if len(classes) < 2:
return None, "Error: target must contain at least 2 classes"
return parsed, None
def count_table(predictions, classes):
rows = [[{"type": "String", "basicValue": "class"}, {"type": "String", "basicValue": "count"}]]
for class_label in classes:
count = sum(type(prediction) is type(class_label) and prediction == class_label for prediction in predictions)
rows.append([cell(class_label), {"type": "Double", "basicValue": float(count)}])
return rows
try:
data_np, error = parse_data(data)
if error:
return error
target_values, error = parse_target(target, data_np.shape[0])
if error:
return error
if int(n_estimators) < 1:
return "Error: n_estimators must be at least 1"
criterion_value = str(rf_criterion).strip().lower()
if criterion_value not in {"gini", "entropy", "log_loss"}:
return "Error: rf_criterion must be 'gini', 'entropy', or 'log_loss'"
depth = None if max_depth in (None, "") else int(max_depth)
if depth is not None and depth < 1:
return "Error: max_depth must be at least 1 when provided"
if int(min_samples_leaf) < 1:
return "Error: min_samples_leaf must be at least 1"
fitted = SklearnRandomForestClassifier(
n_estimators=int(n_estimators),
criterion=criterion_value,
max_depth=depth,
min_samples_leaf=int(min_samples_leaf),
random_state=None if random_state in (None, "") else int(random_state)
).fit(data_np, target_values)
prediction_array = fitted.predict(data_np)
predictions = [py(item) for item in prediction_array.tolist()]
classes = [py(item) for item in fitted.classes_.tolist()]
accuracy = float(np.mean([
type(prediction) is type(actual) and prediction == actual
for prediction, actual in zip(predictions, target_values)
]))
return {
"type": "Double",
"basicValue": accuracy,
"properties": {
"accuracy": {"type": "Double", "basicValue": accuracy},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"class_count": {"type": "Double", "basicValue": float(len(classes))},
"classes": {"type": "Array", "elements": col(classes)},
"predictions": {"type": "Array", "elements": col(predictions)},
"prediction_counts": {"type": "Array", "elements": count_table(predictions, classes)},
"probabilities": {"type": "Array", "elements": mat(fitted.predict_proba(data_np).tolist())},
"feature_importances": {"type": "Array", "elements": col(fitted.feature_importances_.tolist())},
"estimator_count": {"type": "Double", "basicValue": float(len(fitted.estimators_))}
}
}
except Exception as e:
return f"Error: {str(e)}"Online Calculator
RIDGE_CLASSIFY
Ridge classification converts the target labels to \{-1, 1\} (for binary) and treats the task as a regression problem with L2 regularization. The weights w are found by minimizing the penalized squared error:
\min_{w} \|Xw - y\|^2 + \alpha \|w\|^2
The predicted class is the one with the largest linear decision score. It is a fast baseline for dense tabular classification tasks when calibrated probabilities are not required.
This wrapper accepts rows as samples and a target supplied as a single row or single column. It returns training accuracy together with predicted labels, class counts, decision scores, and fitted coefficient arrays.
Excel Usage
=RIDGE_CLASSIFY(data, target, alpha, ridge_solver, fit_intercept, tol, random_state)
data(list[list], required): 2D array of numeric feature data with rows as samples and columns as features.target(list[list], required): Target labels as a single row, single column, or scalar when only one sample is present.alpha(float, optional, default: 1): L2 regularization strength applied to the classifier.ridge_solver(str, optional, default: “auto”): Linear algebra solver used to fit the classifier.fit_intercept(bool, optional, default: true): Whether to include an intercept term in the linear decision function.tol(float, optional, default: 0.0001): Convergence tolerance for iterative solvers.random_state(int, optional, default: null): Integer seed for stochastic solvers. Leave blank for the estimator default.
Returns (dict): Excel data type containing training accuracy, predictions, decision scores, and fitted coefficient arrays.
Example 1: Fit ridge classification for two string-labeled classes
Inputs:
| data | target | alpha | ridge_solver | fit_intercept | tol | random_state | |
|---|---|---|---|---|---|---|---|
| 0 | 0 | cold | 1 | auto | true | 0.0001 | 0 |
| 0 | 1 | cold | |||||
| 1 | 0 | cold | |||||
| 2 | 2 | hot | |||||
| 2 | 3 | hot | |||||
| 3 | 2 | hot |
Excel formula:
=RIDGE_CLASSIFY({0,0;0,1;1,0;2,2;2,3;3,2}, {"cold";"cold";"cold";"hot";"hot";"hot"}, 1, "auto", TRUE, 0.0001, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"hot"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"hot"}],[{"type":"String","basicValue":"hot"}],[{"type":"String","basicValue":"hot"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"cold"},{"type":"Double","basicValue":3}],[{"type":"String","basicValue":"hot"},{"type":"Double","basicValue":3}]]},"decision_scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.17073}],[{"type":"Double","basicValue":-0.731707}],[{"type":"Double","basicValue":-0.731707}],[{"type":"Double","basicValue":0.585366}],[{"type":"Double","basicValue":1.02439}],[{"type":"Double","basicValue":1.02439}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":0.439024},{"type":"Double","basicValue":0.439024}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.17073}]]}}}
Example 2: Flatten a single-row numeric target for ridge classification
Inputs:
| data | target | alpha | ridge_solver | fit_intercept | tol | random_state | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | auto | true | 0.0001 | 0 |
| 0.2 | |||||||||||
| 0.4 | |||||||||||
| 1.2 | |||||||||||
| 1.4 | |||||||||||
| 1.6 |
Excel formula:
=RIDGE_CLASSIFY({0;0.2;0.4;1.2;1.4;1.6}, {0,0,0,1,1,1}, 1, "auto", TRUE, 0.0001, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}]]},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"decision_scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.86747}],[{"type":"Double","basicValue":-0.650602}],[{"type":"Double","basicValue":-0.433735}],[{"type":"Double","basicValue":0.433735}],[{"type":"Double","basicValue":0.650602}],[{"type":"Double","basicValue":0.86747}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":1.08434}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.86747}]]}}}
Example 3: Fit ridge classification for three separated groups
Inputs:
| data | target | alpha | ridge_solver | fit_intercept | tol | random_state | |
|---|---|---|---|---|---|---|---|
| 0 | 0 | left | 1 | auto | true | 0.0001 | 0 |
| 0.2 | 0.1 | left | |||||
| 4 | 4 | center | |||||
| 4.2 | 3.9 | center | |||||
| 8 | 0 | right | |||||
| 8.2 | 0.1 | right |
Excel formula:
=RIDGE_CLASSIFY({0,0;0.2,0.1;4,4;4.2,3.9;8,0;8.2,0.1}, {"left";"left";"center";"center";"right";"right"}, 1, "auto", TRUE, 0.0001, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":3},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"right"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"right"}],[{"type":"String","basicValue":"right"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"center"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"left"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"right"},{"type":"Double","basicValue":2}]]},"decision_scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.992335},{"type":"Double","basicValue":1.00431},{"type":"Double","basicValue":-1.01198}],[{"type":"Double","basicValue":-0.943513},{"type":"Double","basicValue":0.930727},{"type":"Double","basicValue":-0.987214}],[{"type":"Double","basicValue":0.960875},{"type":"Double","basicValue":-0.955539},{"type":"Double","basicValue":-1.00534}],[{"type":"Double","basicValue":0.912022},{"type":"Double","basicValue":-0.980309},{"type":"Double","basicValue":-0.931713}],[{"type":"Double","basicValue":-0.992936},{"type":"Double","basicValue":-0.962803},{"type":"Double","basicValue":0.955739}],[{"type":"Double","basicValue":-0.944113},{"type":"Double","basicValue":-1.03639},{"type":"Double","basicValue":0.980501}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.0000750657},{"type":"Double","basicValue":0.488378}],[{"type":"Double","basicValue":-0.245889},{"type":"Double","basicValue":-0.244073}],[{"type":"Double","basicValue":0.245964},{"type":"Double","basicValue":-0.244304}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.992335}],[{"type":"Double","basicValue":1.00431}],[{"type":"Double","basicValue":-1.01198}]]}}}
Example 4: Fit ridge classification with boolean target labels
Inputs:
| data | target | alpha | ridge_solver | fit_intercept | tol | random_state |
|---|---|---|---|---|---|---|
| 0 | false | 1 | auto | true | 0.0001 | 0 |
| 0.3 | false | |||||
| 0.6 | false | |||||
| 1.4 | true | |||||
| 1.7 | true | |||||
| 2 | true |
Excel formula:
=RIDGE_CLASSIFY({0;0.3;0.6;1.4;1.7;2}, {FALSE;FALSE;FALSE;TRUE;TRUE;TRUE}, 1, "auto", TRUE, 0.0001, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"predictions":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Boolean","basicValue":false},{"type":"Double","basicValue":3}],[{"type":"Boolean","basicValue":true},{"type":"Double","basicValue":3}]]},"decision_scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.976744}],[{"type":"Double","basicValue":-0.683721}],[{"type":"Double","basicValue":-0.390698}],[{"type":"Double","basicValue":0.390698}],[{"type":"Double","basicValue":0.683721}],[{"type":"Double","basicValue":0.976744}]]},"coefficients":{"type":"Array","elements":[[{"type":"Double","basicValue":0.976744}]]},"intercepts":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.976744}]]}}}
Python Code
Show Code
import numpy as np
from sklearn.linear_model import RidgeClassifier as SklearnRidgeClassifier
def ridge_classify(data, target, alpha=1, ridge_solver='auto', fit_intercept=True, tol=0.0001, random_state=None):
"""
Fit a ridge classifier and return training predictions with decision scores.
See: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeClassifier.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
target (list[list]): Target labels as a single row, single column, or scalar when only one sample is present.
alpha (float, optional): L2 regularization strength applied to the classifier. Default is 1.
ridge_solver (str, optional): Linear algebra solver used to fit the classifier. Valid options: Auto, SVD, LSQR, SAG, SAGA. Default is 'auto'.
fit_intercept (bool, optional): Whether to include an intercept term in the linear decision function. Default is True.
tol (float, optional): Convergence tolerance for iterative solvers. Default is 0.0001.
random_state (int, optional): Integer seed for stochastic solvers. Leave blank for the estimator default. Default is None.
Returns:
dict: Excel data type containing training accuracy, predictions, decision scores, and fitted coefficient arrays.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
return data_np, None
def parse_target(value, sample_count):
if not isinstance(value, list):
labels = [value]
elif not value:
return None, "Error: target must be non-empty"
elif all(not isinstance(item, list) for item in value):
labels = value
elif len(value) == 1:
labels = value[0]
elif all(isinstance(row, list) and len(row) == 1 for row in value):
labels = [row[0] for row in value]
else:
return None, "Error: target must be a single row or column"
if len(labels) != sample_count:
return None, "Error: target length must match sample count"
parsed = []
classes = []
for item in labels:
item = py(item)
if isinstance(item, str):
if not item.strip():
return None, "Error: target labels must not be blank"
elif isinstance(item, bool):
item = bool(item)
elif isinstance(item, (int, float)) and not isinstance(item, bool):
if not np.isfinite(float(item)):
return None, "Error: target labels must be finite"
item = float(item) if isinstance(item, float) else int(item)
else:
return None, "Error: target labels must be scalar string, boolean, or numeric values"
parsed.append(item)
if not any(type(existing) is type(item) and existing == item for existing in classes):
classes.append(item)
if len(classes) < 2:
return None, "Error: target must contain at least 2 classes"
return parsed, None
def count_table(predictions, classes):
rows = [[{"type": "String", "basicValue": "class"}, {"type": "String", "basicValue": "count"}]]
for class_label in classes:
count = sum(type(prediction) is type(class_label) and prediction == class_label for prediction in predictions)
rows.append([cell(class_label), {"type": "Double", "basicValue": float(count)}])
return rows
def score_rows(values):
values = np.asarray(values)
return [[float(value)] for value in values.tolist()] if values.ndim == 1 else values.tolist()
try:
data_np, error = parse_data(data)
if error:
return error
target_values, error = parse_target(target, data_np.shape[0])
if error:
return error
solver_value = str(ridge_solver).strip().lower()
if solver_value not in {"auto", "svd", "lsqr", "sag", "saga"}:
return "Error: solver must be 'auto', 'svd', 'lsqr', 'sag', or 'saga'"
if float(alpha) < 0:
return "Error: alpha must be non-negative"
if float(tol) <= 0:
return "Error: tol must be greater than 0"
fitted = SklearnRidgeClassifier(
alpha=float(alpha),
solver=solver_value,
fit_intercept=bool(fit_intercept),
tol=float(tol),
random_state=None if random_state in (None, "") else int(random_state)
).fit(data_np, target_values)
prediction_array = fitted.predict(data_np)
predictions = [py(item) for item in prediction_array.tolist()]
classes = [py(item) for item in fitted.classes_.tolist()]
accuracy = float(np.mean([
type(prediction) is type(actual) and prediction == actual
for prediction, actual in zip(predictions, target_values)
]))
return {
"type": "Double",
"basicValue": accuracy,
"properties": {
"accuracy": {"type": "Double", "basicValue": accuracy},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"class_count": {"type": "Double", "basicValue": float(len(classes))},
"classes": {"type": "Array", "elements": col(classes)},
"predictions": {"type": "Array", "elements": col(predictions)},
"prediction_counts": {"type": "Array", "elements": count_table(predictions, classes)},
"decision_scores": {"type": "Array", "elements": mat(score_rows(fitted.decision_function(data_np)))},
"coefficients": {"type": "Array", "elements": mat(np.atleast_2d(fitted.coef_).tolist())},
"intercepts": {"type": "Array", "elements": col(np.atleast_1d(fitted.intercept_).tolist())}
}
}
except Exception as e:
return f"Error: {str(e)}"Online Calculator
SVC_CLASSIFY
Support vector classification (SVC) fits separating hyperplanes that maximize the margin between classes. By using the “kernel trick,” SVC can efficiently fit curved decision boundaries in higher-dimensional spaces. The decision function for a sample x is:
f(x) = \sum_{i=1}^n \alpha_i y_i K(x_i, x) + b
where K(x_i, x) is the kernel function. For the common Radial Basis Function (RBF) kernel, it is defined as:
K(x, x') = \exp(-\gamma \|x - x'\|^2)
This wrapper accepts rows as samples and a target supplied as a single row or single column. It returns training accuracy together with predicted labels, class counts, decision scores, and support-vector summary properties.
Excel Usage
=SVC_CLASSIFY(data, target, C, svc_kernel, degree, svc_gamma, tol, random_state)
data(list[list], required): 2D array of numeric feature data with rows as samples and columns as features.target(list[list], required): Target labels as a single row, single column, or scalar when only one sample is present.C(float, optional, default: 1): Inverse regularization strength. Smaller values apply stronger regularization.svc_kernel(str, optional, default: “rbf”): Kernel function used to build the separating boundary.degree(int, optional, default: 3): Polynomial degree when the polynomial kernel is used.svc_gamma(str, optional, default: “scale”): Gamma scaling mode for non-linear kernels.tol(float, optional, default: 0.001): Convergence tolerance for the optimizer.random_state(int, optional, default: null): Integer seed for operations that use randomness. Leave blank for the estimator default.
Returns (dict): Excel data type containing training accuracy, predictions, decision scores, and support-vector summary properties.
Example 1: Fit an RBF support vector classifier for two string-labeled groups
Inputs:
| data | target | C | svc_kernel | degree | svc_gamma | tol | random_state | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | cold | 1 | rbf | 3 | scale | 0.001 | 0 |
| 0 | 1 | cold | ||||||
| 1 | 0 | cold | ||||||
| 2 | 2 | hot | ||||||
| 2 | 3 | hot | ||||||
| 3 | 2 | hot |
Excel formula:
=SVC_CLASSIFY({0,0;0,1;1,0;2,2;2,3;3,2}, {"cold";"cold";"cold";"hot";"hot";"hot"}, 1, "rbf", 3, "scale", 0.001, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"hot"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"cold"}],[{"type":"String","basicValue":"hot"}],[{"type":"String","basicValue":"hot"}],[{"type":"String","basicValue":"hot"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"cold"},{"type":"Double","basicValue":3}],[{"type":"String","basicValue":"hot"},{"type":"Double","basicValue":3}]]},"decision_scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-0.999733}],[{"type":"Double","basicValue":-0.999851}],[{"type":"Double","basicValue":-0.999782}],[{"type":"Double","basicValue":0.999772}],[{"type":"Double","basicValue":0.999772}],[{"type":"Double","basicValue":0.999822}]]},"support_vector_count":{"type":"Double","basicValue":6},"support_counts":{"type":"Array","elements":[[{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":3}]]},"support_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":4}],[{"type":"Double","basicValue":5}]]}}}
Example 2: Fit a linear support vector classifier for one-dimensional numeric labels
Inputs:
| data | target | C | svc_kernel | degree | svc_gamma | tol | random_state |
|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | linear | 3 | scale | 0.001 | 0 |
| 0.2 | 0 | ||||||
| 0.4 | 0 | ||||||
| 1.2 | 1 | ||||||
| 1.4 | 1 | ||||||
| 1.6 | 1 |
Excel formula:
=SVC_CLASSIFY({0;0.2;0.4;1.2;1.4;1.6}, {0;0;0;1;1;1}, 1, "linear", 3, "scale", 0.001, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}]]},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"decision_scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.33333}],[{"type":"Double","basicValue":-1}],[{"type":"Double","basicValue":-0.666667}],[{"type":"Double","basicValue":0.666667}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1.33333}]]},"support_vector_count":{"type":"Double","basicValue":4},"support_counts":{"type":"Array","elements":[[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":2}]]},"support_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":4}]]}}}
Example 3: Fit a support vector classifier for three separated groups
Inputs:
| data | target | C | svc_kernel | degree | svc_gamma | tol | random_state | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | left | 1 | rbf | 3 | scale | 0.001 | 0 |
| 0.2 | 0.1 | left | ||||||
| 4 | 4 | center | ||||||
| 4.2 | 3.9 | center | ||||||
| 8 | 0 | right | ||||||
| 8.2 | 0.1 | right |
Excel formula:
=SVC_CLASSIFY({0,0;0.2,0.1;4,4;4.2,3.9;8,0;8.2,0.1}, {"left";"left";"center";"center";"right";"right"}, 1, "rbf", 3, "scale", 0.001, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":3},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"right"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"right"}],[{"type":"String","basicValue":"right"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"center"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"left"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"right"},{"type":"Double","basicValue":2}]]},"decision_scores":{"type":"Array","elements":[[{"type":"Double","basicValue":0.847182},{"type":"Double","basicValue":2.22222},{"type":"Double","basicValue":-0.178542}],[{"type":"Double","basicValue":0.851956},{"type":"Double","basicValue":2.22115},{"type":"Double","basicValue":-0.17988}],[{"type":"Double","basicValue":2.22194},{"type":"Double","basicValue":0.835983},{"type":"Double","basicValue":-0.168615}],[{"type":"Double","basicValue":2.22125},{"type":"Double","basicValue":-0.167408},{"type":"Double","basicValue":0.836299}],[{"type":"Double","basicValue":0.850081},{"type":"Double","basicValue":-0.17988},{"type":"Double","basicValue":2.22184}],[{"type":"Double","basicValue":0.848998},{"type":"Double","basicValue":-0.179853},{"type":"Double","basicValue":2.22222}]]},"support_vector_count":{"type":"Double","basicValue":6},"support_counts":{"type":"Array","elements":[[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":2}]]},"support_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":4}],[{"type":"Double","basicValue":5}]]}}}
Example 4: Flatten a single-row boolean target range for support vector classification
Inputs:
| data | target | C | svc_kernel | degree | svc_gamma | tol | random_state | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | false | false | false | true | true | true | 1 | linear | 3 | scale | 0.001 | 0 |
| 0.3 | ||||||||||||
| 0.6 | ||||||||||||
| 1.4 | ||||||||||||
| 1.7 | ||||||||||||
| 2 |
Excel formula:
=SVC_CLASSIFY({0;0.3;0.6;1.4;1.7;2}, {FALSE,FALSE,FALSE,TRUE,TRUE,TRUE}, 1, "linear", 3, "scale", 0.001, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"predictions":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Boolean","basicValue":false},{"type":"Double","basicValue":3}],[{"type":"Boolean","basicValue":true},{"type":"Double","basicValue":3}]]},"decision_scores":{"type":"Array","elements":[[{"type":"Double","basicValue":-1.42857}],[{"type":"Double","basicValue":-1}],[{"type":"Double","basicValue":-0.571429}],[{"type":"Double","basicValue":0.571428}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1.42857}]]},"support_vector_count":{"type":"Double","basicValue":4},"support_counts":{"type":"Array","elements":[[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":2}]]},"support_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":4}]]}}}
Python Code
Show Code
import numpy as np
from sklearn.svm import SVC as SklearnSVC
def svc_classify(data, target, C=1, svc_kernel='rbf', degree=3, svc_gamma='scale', tol=0.001, random_state=None):
"""
Fit a support vector classifier and return training predictions.
See: https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
target (list[list]): Target labels as a single row, single column, or scalar when only one sample is present.
C (float, optional): Inverse regularization strength. Smaller values apply stronger regularization. Default is 1.
svc_kernel (str, optional): Kernel function used to build the separating boundary. Valid options: RBF, Linear, Polynomial, Sigmoid. Default is 'rbf'.
degree (int, optional): Polynomial degree when the polynomial kernel is used. Default is 3.
svc_gamma (str, optional): Gamma scaling mode for non-linear kernels. Valid options: Scale, Auto. Default is 'scale'.
tol (float, optional): Convergence tolerance for the optimizer. Default is 0.001.
random_state (int, optional): Integer seed for operations that use randomness. Leave blank for the estimator default. Default is None.
Returns:
dict: Excel data type containing training accuracy, predictions, decision scores, and support-vector summary properties.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
return data_np, None
def parse_target(value, sample_count):
if not isinstance(value, list):
labels = [value]
elif not value:
return None, "Error: target must be non-empty"
elif all(not isinstance(item, list) for item in value):
labels = value
elif len(value) == 1:
labels = value[0]
elif all(isinstance(row, list) and len(row) == 1 for row in value):
labels = [row[0] for row in value]
else:
return None, "Error: target must be a single row or column"
if len(labels) != sample_count:
return None, "Error: target length must match sample count"
parsed = []
classes = []
for item in labels:
item = py(item)
if isinstance(item, str):
if not item.strip():
return None, "Error: target labels must not be blank"
elif isinstance(item, bool):
item = bool(item)
elif isinstance(item, (int, float)) and not isinstance(item, bool):
if not np.isfinite(float(item)):
return None, "Error: target labels must be finite"
item = float(item) if isinstance(item, float) else int(item)
else:
return None, "Error: target labels must be scalar string, boolean, or numeric values"
parsed.append(item)
if not any(type(existing) is type(item) and existing == item for existing in classes):
classes.append(item)
if len(classes) < 2:
return None, "Error: target must contain at least 2 classes"
return parsed, None
def count_table(predictions, classes):
rows = [[{"type": "String", "basicValue": "class"}, {"type": "String", "basicValue": "count"}]]
for class_label in classes:
count = sum(type(prediction) is type(class_label) and prediction == class_label for prediction in predictions)
rows.append([cell(class_label), {"type": "Double", "basicValue": float(count)}])
return rows
def score_rows(values):
values = np.asarray(values)
return [[float(value)] for value in values.tolist()] if values.ndim == 1 else values.tolist()
try:
data_np, error = parse_data(data)
if error:
return error
target_values, error = parse_target(target, data_np.shape[0])
if error:
return error
if float(C) <= 0:
return "Error: C must be greater than 0"
kernel_value = str(svc_kernel).strip().lower()
if kernel_value not in {"rbf", "linear", "poly", "sigmoid"}:
return "Error: svc_kernel must be 'rbf', 'linear', 'poly', or 'sigmoid'"
if int(degree) < 1:
return "Error: degree must be at least 1"
gamma_value = str(svc_gamma).strip().lower()
if gamma_value not in {"scale", "auto"}:
return "Error: svc_gamma must be 'scale' or 'auto'"
if float(tol) <= 0:
return "Error: tol must be greater than 0"
fitted = SklearnSVC(
C=float(C),
kernel=kernel_value,
degree=int(degree),
gamma=gamma_value,
tol=float(tol),
random_state=None if random_state in (None, "") else int(random_state)
).fit(data_np, target_values)
prediction_array = fitted.predict(data_np)
predictions = [py(item) for item in prediction_array.tolist()]
classes = [py(item) for item in fitted.classes_.tolist()]
accuracy = float(np.mean([
type(prediction) is type(actual) and prediction == actual
for prediction, actual in zip(predictions, target_values)
]))
return {
"type": "Double",
"basicValue": accuracy,
"properties": {
"accuracy": {"type": "Double", "basicValue": accuracy},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"class_count": {"type": "Double", "basicValue": float(len(classes))},
"classes": {"type": "Array", "elements": col(classes)},
"predictions": {"type": "Array", "elements": col(predictions)},
"prediction_counts": {"type": "Array", "elements": count_table(predictions, classes)},
"decision_scores": {"type": "Array", "elements": mat(score_rows(fitted.decision_function(data_np)))},
"support_vector_count": {"type": "Double", "basicValue": float(len(fitted.support_))},
"support_counts": {"type": "Array", "elements": col(fitted.n_support_.tolist())},
"support_indices": {"type": "Array", "elements": col(fitted.support_.tolist())}
}
}
except Exception as e:
return f"Error: {str(e)}"Online Calculator