DISTANCE_CORR
This function computes distance correlation, a dependence measure that detects both linear and non-linear associations between variables.
Distance correlation is zero if and only if the variables are independent (under finite first moments), making it more general than Pearson correlation for non-linear relationships.
The sample statistic is based on centered pairwise distance matrices and normalized distance covariance.
Excel Usage
=DISTANCE_CORR(x, y, n_boot, seed)
x(list[list], required): 2D range of numeric values for the first variable.y(list[list], required): 2D range of numeric values for the second variable.n_boot(int, optional, default: 0): Number of bootstrap samples for p-value estimation; set 0 to skip bootstrap.seed(int, optional, default: null): Optional random seed for bootstrap reproducibility.
Returns (float): Distance correlation coefficient.
Example 1: Strong positive linear dependence
Inputs:
| x | y |
|---|---|
| 1 | 2 |
| 2 | 4 |
| 3 | 6 |
| 4 | 8 |
| 5 | 10 |
| 6 | 12 |
Excel formula:
=DISTANCE_CORR({1;2;3;4;5;6}, {2;4;6;8;10;12})
Expected output:
1
Example 2: Nonlinear monotonic dependence
Inputs:
| x | y |
|---|---|
| 1 | 1 |
| 2 | 4 |
| 3 | 9 |
| 4 | 16 |
| 5 | 25 |
| 6 | 36 |
Excel formula:
=DISTANCE_CORR({1;2;3;4;5;6}, {1;4;9;16;25;36})
Expected output:
0.98631
Example 3: Distance correlation with bootstrap setting
Inputs:
| x | y | n_boot | seed |
|---|---|---|---|
| 1 | 1 | 100 | 42 |
| 2 | 3 | ||
| 3 | 2 | ||
| 4 | 5 | ||
| 5 | 4 | ||
| 6 | 6 |
Excel formula:
=DISTANCE_CORR({1;2;3;4;5;6}, {1;3;2;5;4;6}, 100, 42)
Expected output:
0.884874
Example 4: Row vector inputs are flattened
Inputs:
| x | y | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 5 | 4 | 3 | 2 | 1 |
Excel formula:
=DISTANCE_CORR({1,2,3,4,5}, {5,4,3,2,1})
Expected output:
1
Python Code
import pingouin as pg
def distance_corr(x, y, n_boot=0, seed=None):
"""
Compute distance correlation between two numeric variables.
See: https://pingouin-stats.org/generated/pingouin.distance_corr.html
This example function is provided as-is without any representation of accuracy.
Args:
x (list[list]): 2D range of numeric values for the first variable.
y (list[list]): 2D range of numeric values for the second variable.
n_boot (int, optional): Number of bootstrap samples for p-value estimation; set 0 to skip bootstrap. Default is 0.
seed (int, optional): Optional random seed for bootstrap reproducibility. Default is None.
Returns:
float: Distance correlation coefficient.
"""
try:
def to2d(value):
return [[value]] if not isinstance(value, list) else value
def flatten_numeric(matrix):
values = []
for row in matrix:
if not isinstance(row, list):
return None
for item in row:
try:
values.append(float(item))
except (TypeError, ValueError):
continue
return values
x_matrix = to2d(x)
y_matrix = to2d(y)
if not isinstance(x_matrix, list) or not isinstance(y_matrix, list):
return "Error: Invalid input - x and y must be 2D lists"
x_values = flatten_numeric(x_matrix)
y_values = flatten_numeric(y_matrix)
if x_values is None or y_values is None:
return "Error: Invalid input - x and y must be 2D lists"
if len(x_values) != len(y_values):
return "Error: Invalid input - x and y must contain the same number of numeric values"
if len(x_values) < 2:
return "Error: Invalid input - x and y must each contain at least two numeric values"
seed_arg = None if seed is None else int(seed)
n_boot_arg = None if int(n_boot) <= 0 else int(n_boot)
result = pg.distance_corr(x_values, y_values, n_boot=n_boot_arg, seed=seed_arg)
if isinstance(result, (tuple, list)):
return float(result[0])
return float(result)
except Exception as e:
return f"Error: {str(e)}"Online Calculator
2D range of numeric values for the first variable.
2D range of numeric values for the second variable.
Number of bootstrap samples for p-value estimation; set 0 to skip bootstrap.
Optional random seed for bootstrap reproducibility.