VARIANCE_SEL
Variance threshold feature selection removes columns whose training-set variance falls below a chosen cutoff. It is a simple unsupervised filter for dropping constant or nearly constant features before modeling.
The sample variance for a feature X with n samples is computed as:
\text{Var}(X) = \frac{1}{n} \sum_{i=1}^n (x_i - \mu)^2
where \mu is the mean of the feature.
This wrapper accepts rows as samples and columns as features. It returns the filtered matrix together with per-feature variances, a boolean support mask, and the 1-based indices of the selected columns.
Excel Usage
=VARIANCE_SEL(data, threshold)
data(list[list], required): 2D array of numeric input data with rows as samples and columns as features.threshold(float, optional, default: 0): Minimum variance required for a feature to be kept.
Returns (dict): Excel data type containing the selected feature matrix, support mask, and per-feature variances.
Example 1: Remove constant columns with the default threshold
Inputs:
| data | threshold | |||
|---|---|---|---|---|
| 1 | 0 | 5 | 10 | 0 |
| 1 | 1 | 5 | 11 | |
| 1 | 0 | 5 | 12 | |
| 1 | 1 | 5 | 13 |
Excel formula:
=VARIANCE_SEL({1,0,5,10;1,1,5,11;1,0,5,12;1,1,5,13}, 0)
Expected output:
{"type":"Double","basicValue":2,"properties":{"selected_feature_count":{"type":"Double","basicValue":2},"sample_count":{"type":"Double","basicValue":4},"feature_count":{"type":"Double","basicValue":4},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":4}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":0.25}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1.25}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":0},{"type":"Double","basicValue":10}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":11}],[{"type":"Double","basicValue":0},{"type":"Double","basicValue":12}],[{"type":"Double","basicValue":1},{"type":"Double","basicValue":13}]]}}}
Example 2: Drop weakly varying columns with a positive threshold
Inputs:
| data | threshold | ||
|---|---|---|---|
| 0 | 1 | 10 | 0.01 |
| 0.1 | 1.1 | 11 | |
| 0 | 0.9 | 12 | |
| 0.1 | 1 | 13 | |
| 0 | 1.1 | 14 |
Excel formula:
=VARIANCE_SEL({0,1,10;0.1,1.1,11;0,0.9,12;0.1,1,13;0,1.1,14}, 0.01)
Expected output:
{"type":"Double","basicValue":1,"properties":{"selected_feature_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":3},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":3}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":0.0024}],[{"type":"Double","basicValue":0.0056}],[{"type":"Double","basicValue":2}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":10}],[{"type":"Double","basicValue":11}],[{"type":"Double","basicValue":12}],[{"type":"Double","basicValue":13}],[{"type":"Double","basicValue":14}]]}}}
Example 3: Keep a single varying feature column
Inputs:
| data | threshold |
|---|---|
| 2 | 0 |
| 3 | |
| 5 | |
| 8 | |
| 13 |
Excel formula:
=VARIANCE_SEL({2;3;5;8;13}, 0)
Expected output:
{"type":"Double","basicValue":1,"properties":{"selected_feature_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":1},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":true}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":11}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":2}],[{"type":"Double","basicValue":3}],[{"type":"Double","basicValue":5}],[{"type":"Double","basicValue":8}],[{"type":"Double","basicValue":13}]]}}}
Example 4: Keep only the highest-variance columns with a larger threshold
Inputs:
| data | threshold | ||
|---|---|---|---|
| 10 | 0 | 100 | 30 |
| 12 | 0 | 110 | |
| 14 | 0 | 130 | |
| 16 | 0 | 160 | |
| 18 | 0 | 200 |
Excel formula:
=VARIANCE_SEL({10,0,100;12,0,110;14,0,130;16,0,160;18,0,200}, 30)
Expected output:
{"type":"Double","basicValue":1,"properties":{"selected_feature_count":{"type":"Double","basicValue":1},"sample_count":{"type":"Double","basicValue":5},"feature_count":{"type":"Double","basicValue":3},"selected_indices":{"type":"Array","elements":[[{"type":"Double","basicValue":3}]]},"support_mask":{"type":"Array","elements":[[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":false}],[{"type":"Boolean","basicValue":true}]]},"variances":{"type":"Array","elements":[[{"type":"Double","basicValue":8}],[{"type":"Double","basicValue":0}],[{"type":"Double","basicValue":1320}]]},"selected_data":{"type":"Array","elements":[[{"type":"Double","basicValue":100}],[{"type":"Double","basicValue":110}],[{"type":"Double","basicValue":130}],[{"type":"Double","basicValue":160}],[{"type":"Double","basicValue":200}]]}}}
Python Code
import numpy as np
from sklearn.feature_selection import VarianceThreshold as SklearnVarianceThreshold
def variance_sel(data, threshold=0):
"""
Remove low-variance features and return the selected columns with variance summaries.
See: https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.VarianceThreshold.html
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of numeric input data with rows as samples and columns as features.
threshold (float, optional): Minimum variance required for a feature to be kept. Default is 0.
Returns:
dict: Excel data type containing the selected feature matrix, support mask, and per-feature variances.
"""
def py(value):
return value.item() if isinstance(value, np.generic) else value
def cell(value):
value = py(value)
if isinstance(value, bool):
return {"type": "Boolean", "basicValue": bool(value)}
if isinstance(value, (int, float)) and not isinstance(value, bool):
return {"type": "Double", "basicValue": float(value)}
return {"type": "String", "basicValue": str(value)}
def col(values):
return [[cell(value)] for value in values]
def mat(values):
return [[cell(value) for value in row] for row in values]
def parse_data(value):
value = [[value]] if not isinstance(value, list) else value
if not isinstance(value, list) or not value or not all(isinstance(row, list) and row for row in value):
return None, "Error: data must be a non-empty 2D list"
if len({len(row) for row in value}) != 1:
return None, "Error: data must be a rectangular 2D list"
data_np = np.array(value, dtype=float)
if data_np.ndim != 2 or data_np.size == 0:
return None, "Error: data must be a non-empty 2D list"
if not np.isfinite(data_np).all():
return None, "Error: data must contain only finite numeric values"
return data_np, None
try:
data_np, error = parse_data(data)
if error:
return error
threshold_value = float(threshold)
if threshold_value < 0:
return "Error: threshold must be non-negative"
fitted = SklearnVarianceThreshold(threshold=threshold_value)
transformed_np = np.asarray(fitted.fit_transform(data_np), dtype=float)
variances = np.atleast_1d(np.asarray(fitted.variances_, dtype=float))
support_mask = fitted.get_support().tolist()
selected_indices = [index + 1 for index, keep in enumerate(support_mask) if keep]
return {
"type": "Double",
"basicValue": float(len(selected_indices)),
"properties": {
"selected_feature_count": {"type": "Double", "basicValue": float(len(selected_indices))},
"sample_count": {"type": "Double", "basicValue": float(data_np.shape[0])},
"feature_count": {"type": "Double", "basicValue": float(data_np.shape[1])},
"selected_indices": {"type": "Array", "elements": col(selected_indices)},
"support_mask": {"type": "Array", "elements": col(support_mask)},
"variances": {"type": "Array", "elements": col(variances.tolist())},
"selected_data": {"type": "Array", "elements": mat(transformed_np.tolist())}
}
}
except Exception as e:
return f"Error: {str(e)}"