DISTANCE_CORR

This function computes distance correlation, a dependence measure that detects both linear and non-linear associations between variables.

Distance correlation is zero if and only if the variables are independent (under finite first moments), making it more general than Pearson correlation for non-linear relationships.

The sample statistic is based on centered pairwise distance matrices and normalized distance covariance.

Excel Usage

=DISTANCE_CORR(x, y, n_boot, seed)
  • x (list[list], required): 2D range of numeric values for the first variable.
  • y (list[list], required): 2D range of numeric values for the second variable.
  • n_boot (int, optional, default: 0): Number of bootstrap samples for p-value estimation; set 0 to skip bootstrap.
  • seed (int, optional, default: null): Optional random seed for bootstrap reproducibility.

Returns (float): Distance correlation coefficient.

Example 1: Strong positive linear dependence

Inputs:

x y
1 2
2 4
3 6
4 8
5 10
6 12

Excel formula:

=DISTANCE_CORR({1;2;3;4;5;6}, {2;4;6;8;10;12})

Expected output:

1

Example 2: Nonlinear monotonic dependence

Inputs:

x y
1 1
2 4
3 9
4 16
5 25
6 36

Excel formula:

=DISTANCE_CORR({1;2;3;4;5;6}, {1;4;9;16;25;36})

Expected output:

0.98631

Example 3: Distance correlation with bootstrap setting

Inputs:

x y n_boot seed
1 1 100 42
2 3
3 2
4 5
5 4
6 6

Excel formula:

=DISTANCE_CORR({1;2;3;4;5;6}, {1;3;2;5;4;6}, 100, 42)

Expected output:

0.884874

Example 4: Row vector inputs are flattened

Inputs:

x y
1 2 3 4 5 5 4 3 2 1

Excel formula:

=DISTANCE_CORR({1,2,3,4,5}, {5,4,3,2,1})

Expected output:

1

Python Code

import pingouin as pg

def distance_corr(x, y, n_boot=0, seed=None):
    """
    Compute distance correlation between two numeric variables.

    See: https://pingouin-stats.org/generated/pingouin.distance_corr.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        x (list[list]): 2D range of numeric values for the first variable.
        y (list[list]): 2D range of numeric values for the second variable.
        n_boot (int, optional): Number of bootstrap samples for p-value estimation; set 0 to skip bootstrap. Default is 0.
        seed (int, optional): Optional random seed for bootstrap reproducibility. Default is None.

    Returns:
        float: Distance correlation coefficient.
    """
    try:
        def to2d(value):
            return [[value]] if not isinstance(value, list) else value

        def flatten_numeric(matrix):
            values = []
            for row in matrix:
                if not isinstance(row, list):
                    return None
                for item in row:
                    try:
                        values.append(float(item))
                    except (TypeError, ValueError):
                        continue
            return values

        x_matrix = to2d(x)
        y_matrix = to2d(y)

        if not isinstance(x_matrix, list) or not isinstance(y_matrix, list):
            return "Error: Invalid input - x and y must be 2D lists"

        x_values = flatten_numeric(x_matrix)
        y_values = flatten_numeric(y_matrix)
        if x_values is None or y_values is None:
            return "Error: Invalid input - x and y must be 2D lists"

        if len(x_values) != len(y_values):
            return "Error: Invalid input - x and y must contain the same number of numeric values"
        if len(x_values) < 2:
            return "Error: Invalid input - x and y must each contain at least two numeric values"

        seed_arg = None if seed is None else int(seed)
        n_boot_arg = None if int(n_boot) <= 0 else int(n_boot)

        result = pg.distance_corr(x_values, y_values, n_boot=n_boot_arg, seed=seed_arg)

        if isinstance(result, (tuple, list)):
            return float(result[0])
        return float(result)
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

2D range of numeric values for the first variable.
2D range of numeric values for the second variable.
Number of bootstrap samples for p-value estimation; set 0 to skip bootstrap.
Optional random seed for bootstrap reproducibility.