GAUSSIAN_NB

Gaussian naive Bayes is a probabilistic classifier that applies Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable:

P(y \mid x_1, \dots, x_n) \propto P(y) \prod_{i=1}^n P(x_i \mid y)

In Gaussian naive Bayes, the likelihood of the features is assumed to follow a normal distribution:

P(x_i \mid y) = \frac{1}{\sqrt{2\pi\sigma_{iy}^2}} \exp\left(-\frac{(x_i - \mu_{iy})^2}{2\sigma_{iy}^2}\right)

This implementation accepts rows as samples and a target supplied as a single row or single column. It returns training accuracy together with predicted labels, class counts, fitted probabilities, class priors, and per-class Gaussian parameters.

Excel Usage

=GAUSSIAN_NB(data, target, var_smoothing)
  • data (list[list], required): 2D array of numeric feature data with rows as samples and columns as features.
  • target (list[list], required): Target labels as a single row, single column, or scalar when only one sample is present.
  • var_smoothing (float, optional, default: 1e-9): Portion of the largest feature variance added for numerical stability.

Returns (dict): Excel data type containing training accuracy, predictions, probabilities, and fitted Gaussian parameter arrays.

Example 1: Fit Gaussian naive Bayes for two string-labeled classes

Inputs:

data target var_smoothing
0 0 low 1e-9
0.1 0.2 low
0.2 0 low
2 2 high
2.1 2.2 high
2.2 2 high

Excel formula:

=GAUSSIAN_NB({0,0;0.1,0.2;0.2,0;2,2;2.1,2.2;2.2,2}, {"low";"low";"low";"high";"high";"high"}, 1e-9)

Expected output:

{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"high"}],[{"type":"String","basicValue":"low"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"low"}],[{"type":"String","basicValue":"low"}],[{"type":"String","basicValue":"low"}],[{"type":"String","basicValue":"high"}],[{"type":"String","basicValue":"high"}],[{"type":"String","basicValue":"high"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"high"},{"type":"Double","basicValue":3}],[{"type":"String","basicValue":"low"},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":2.83256e-248},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1.05745e-215},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":3.23477e-222},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3.45682e-209}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":9.25966e-242}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3.027e-235}]]},"class_priors":{"type":"Array","elements":[[{"type":"Double","basicValue":0.5}],[{"type":"Double","basicValue":0.5}]]},"theta":{"type":"Array","elements":[[{"type":"Double","basicValue":2.1},{"type":"Double","basicValue":2.06667}],[{"type":"Double","basicValue":0.1},{"type":"Double","basicValue":0.0666667}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.00666667},{"type":"Double","basicValue":0.00888889}],[{"type":"Double","basicValue":0.00666667},{"type":"Double","basicValue":0.00888889}]]}}}

Example 2: Fit Gaussian naive Bayes for one-dimensional numeric labels

Inputs:

data target var_smoothing
0 0 1e-9
0.2 0
0.4 0
1.2 1
1.4 1
1.6 1

Excel formula:

=GAUSSIAN_NB({0;0.2;0.4;1.2;1.4;1.6}, {0;0;0;1;1;1}, 1e-9)

Expected output:

{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}]]},"predictions":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":2.31952e-16}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":1.87953e-12}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":1.523e-8}],[{"type":"Double","basicValue":1.523e-8},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":1.87953e-12},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":2.31952e-16},{"type":"Double","basicValue":1}]]},"class_priors":{"type":"Array","elements":[[{"type":"Double","basicValue":0.5}],[{"type":"Double","basicValue":0.5}]]},"theta":{"type":"Array","elements":[[{"type":"Double","basicValue":0.2}],[{"type":"Double","basicValue":1.4}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.0266667}],[{"type":"Double","basicValue":0.0266667}]]}}}

Example 3: Fit Gaussian naive Bayes for three separated groups

Inputs:

data target var_smoothing
0 0 left 1e-9
0.2 0.1 left
4 4 center
4.2 3.9 center
8 0 right
8.2 0.1 right

Excel formula:

=GAUSSIAN_NB({0,0;0.2,0.1;4,4;4.2,3.9;8,0;8.2,0.1}, {"left";"left";"center";"center";"right";"right"}, 1e-9)

Expected output:

{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":2},"class_count":{"type":"Double","basicValue":3},"classes":{"type":"Array","elements":[[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"right"}]]},"predictions":{"type":"Array","elements":[[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"left"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"center"}],[{"type":"String","basicValue":"right"}],[{"type":"String","basicValue":"right"}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"String","basicValue":"center"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"left"},{"type":"Double","basicValue":2}],[{"type":"String","basicValue":"right"},{"type":"Double","basicValue":2}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":1},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":0},{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":0},{"type":"Double","basicValue":1}]]},"class_priors":{"type":"Array","elements":[[{"type":"Double","basicValue":0.333333}],[{"type":"Double","basicValue":0.333333}],[{"type":"Double","basicValue":0.333333}]]},"theta":{"type":"Array","elements":[[{"type":"Double","basicValue":4.1},{"type":"Double","basicValue":3.95}],[{"type":"Double","basicValue":0.1},{"type":"Double","basicValue":0.05}],[{"type":"Double","basicValue":8.1},{"type":"Double","basicValue":0.05}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.01},{"type":"Double","basicValue":0.00250001}],[{"type":"Double","basicValue":0.01},{"type":"Double","basicValue":0.00250001}],[{"type":"Double","basicValue":0.01},{"type":"Double","basicValue":0.00250001}]]}}}

Example 4: Flatten a single-row boolean target range for Gaussian naive Bayes

Inputs:

data target var_smoothing
0 false false false true true true 1e-9
0.3
0.6
1.4
1.7
2

Excel formula:

=GAUSSIAN_NB({0;0.3;0.6;1.4;1.7;2}, {FALSE,FALSE,FALSE,TRUE,TRUE,TRUE}, 1e-9)

Expected output:

{"type":"Double","basicValue":1,"properties":{"accuracy":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":6},"feature_count":{"type":"Double","basicValue":1},"class_count":{"type":"Double","basicValue":2},"classes":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"predictions":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":true}]]},"prediction_counts":{"type":"Array","elements":[[{"type":"String","basicValue":"class"},{"type":"String","basicValue":"count"}],[{"type":"Boolean","basicValue":false},{"type":"Double","basicValue":3}],[{"type":"Boolean","basicValue":true},{"type":"Double","basicValue":3}]]},"probabilities":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":7.35296e-11}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":8.0635e-8}],[{"type":"Double","basicValue":0.999912},{"type":"Double","basicValue":0.0000884192}],[{"type":"Double","basicValue":0.0000884192},{"type":"Double","basicValue":0.999912}],[{"type":"Double","basicValue":8.0635e-8},{"type":"Double","basicValue":1}],[{"type":"Double","basicValue":7.35296e-11},{"type":"Double","basicValue":1}]]},"class_priors":{"type":"Array","elements":[[{"type":"Double","basicValue":0.5}],[{"type":"Double","basicValue":0.5}]]},"theta":{"type":"Array","elements":[[{"type":"Double","basicValue":0.3}],[{"type":"Double","basicValue":1.7}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.06}],[{"type":"Double","basicValue":0.06}]]}}}

Python Code

import numpy as np
from sklearn.naive_bayes import GaussianNB as SklearnGaussianNB

def gaussian_nb(data, target, var_smoothing=1e-09):
    """
    Fit a Gaussian naive Bayes classifier and return training predictions.

    See: https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of numeric feature data with rows as samples and columns as features.
        target (list[list]): Target labels as a single row, single column, or scalar when only one sample is present.
        var_smoothing (float, optional): Portion of the largest feature variance added for numerical stability. Default is 1e-09.

    Returns:
        dict: Excel data type containing training accuracy, predictions, probabilities, and fitted Gaussian parameter arrays.
    """
    def py(value):
        return value.item() if isinstance(value, np.generic) else value

    def cell(value):
        value = py(value)
        if isinstance(value, bool):
            return {"type": "Boolean", "basicValue": bool(value)}
        if isinstance(value, (int, float)) and not isinstance(value, bool):
            return {"type": "Double", "basicValue": float(value)}
        return {"type": "String", "basicValue": str(value)}

    def col(values):
        return [[cell(value)] for value in values]

    def mat(values):
        return [[cell(value) for value in row] for row in values]

    def parse_data(value):
        value = [[value]] if not isinstance(value, list) else value
        if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
            return None, "Error: data must be a non-empty 2D list"
        if len({len(row) for row in value}) != 1:
            return None, "Error: data must be a rectangular 2D list"
        data_np = np.array(value, dtype=float)
        if data_np.ndim != 2 or data_np.size == 0:
            return None, "Error: data must be a non-empty 2D list"
        if not np.isfinite(data_np).all():
            return None, "Error: data must contain only finite numeric values"
        return data_np, None

    def parse_target(value, sample_count):
        if not isinstance(value, list):
            labels = [value]
        elif not value:
            return None, "Error: target must be non-empty"
        elif all(not isinstance(item, list) for item in value):
            labels = value
        elif len(value) == 1:
            labels = value[0]
        elif all(isinstance(row, list) and len(row) == 1 for row in value):
            labels = [row[0] for row in value]
        else:
            return None, "Error: target must be a single row or column"

        if len(labels) != sample_count:
            return None, "Error: target length must match sample count"

        parsed = []
        classes = []
        for item in labels:
            item = py(item)
            if isinstance(item, str):
                if not item.strip():
                    return None, "Error: target labels must not be blank"
            elif isinstance(item, bool):
                item = bool(item)
            elif isinstance(item, (int, float)) and not isinstance(item, bool):
                if not np.isfinite(float(item)):
                    return None, "Error: target labels must be finite"
                item = float(item) if isinstance(item, float) else int(item)
            else:
                return None, "Error: target labels must be scalar string, boolean, or numeric values"
            parsed.append(item)
            if not any(type(existing) is type(item) and existing == item for existing in classes):
                classes.append(item)

        if len(classes) < 2:
            return None, "Error: target must contain at least 2 classes"
        return parsed, None

    def count_table(predictions, classes):
        rows = [[{"type": "String", "basicValue": "class"}, {"type": "String", "basicValue": "count"}]]
        for class_label in classes:
            count = sum(type(prediction) is type(class_label) and prediction == class_label for prediction in predictions)
            rows.append([cell(class_label), {"type": "Double", "basicValue": float(count)}])
        return rows

    try:
        data_np, error = parse_data(data)
        if error:
            return error

        target_values, error = parse_target(target, data_np.shape[0])
        if error:
            return error

        if float(var_smoothing) <= 0:
            return "Error: var_smoothing must be greater than 0"

        fitted = SklearnGaussianNB(var_smoothing=float(var_smoothing)).fit(data_np, target_values)

        prediction_array = fitted.predict(data_np)
        predictions = [py(item) for item in prediction_array.tolist()]
        classes = [py(item) for item in fitted.classes_.tolist()]
        accuracy = float(np.mean([
            type(prediction) is type(actual) and prediction == actual
            for prediction, actual in zip(predictions, target_values)
        ]))

        return {
            "type": "Double",
            "basicValue": accuracy,
            "properties": {
                "accuracy": {"type": "Double", "basicValue": accuracy},
                "sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
                "feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
                "class_count": {"type": "Double", "basicValue": float(len(classes))},
                "classes": {"type": "Array", "elements": col(classes)},
                "predictions": {"type": "Array", "elements": col(predictions)},
                "prediction_counts": {"type": "Array", "elements": count_table(predictions, classes)},
                "probabilities": {"type": "Array", "elements": mat(fitted.predict_proba(data_np).tolist())},
                "class_priors": {"type": "Array", "elements": col(fitted.class_prior_.tolist())},
                "theta": {"type": "Array", "elements": mat(fitted.theta_.tolist())},
                "variances": {"type": "Array", "elements": mat(fitted.var_.tolist())}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

2D array of numeric feature data with rows as samples and columns as features.
Target labels as a single row, single column, or scalar when only one sample is present.
Portion of the largest feature variance added for numerical stability.