MEDIATION_ANALYSIS

This function estimates mediation effects to quantify how much of the relationship between a predictor and an outcome is explained through one or more mediator variables.

The total effect of predictor X on outcome Y is decomposed into a direct effect and an indirect effect through mediator M:

c = c' + ab

where c is the total effect, c' is the direct effect, and ab is the indirect effect. Confidence intervals and p-values for indirect effects are estimated using non-parametric bootstrap resampling.

Input data is provided as an Excel range where the first row contains column names and subsequent rows contain observations.

Excel Usage

=MEDIATION_ANALYSIS(data, x, m, y, covar, alpha, n_boot, seed)

data (list[list], required): Input table as a 2D range with header row followed by observation rows.
x (str, required): Predictor column name.
m (str, required): Mediator column name, or comma-separated mediator column names for parallel mediation.
y (str, required): Outcome column name.
covar (str, optional, default: null): Optional covariate column name, or comma-separated covariate names.
alpha (float, optional, default: 0.05): Significance level for confidence intervals (unitless).
n_boot (int, optional, default: 500): Number of bootstrap resamples used for indirect effect inference.
seed (int, optional, default: null): Optional random seed for reproducible bootstrap sampling.

Returns (list[list]): Mediation summary table with path coefficients, uncertainty metrics, confidence intervals, p-values, and significance labels.

Example 1: Single mediator with deterministic linear trend

Inputs:

data			x	m	y	alpha	n_boot	seed
X	M	Y	X	M	Y	0.05	200	42
1	2.2	2.3
2	2.8	3.1
3	3.9	4.3
4	5	5.2
5	5.9	6
6	7.1	7.4
7	8.2	8
8	8.8	9.2
9	9.9	10.1
10	11.1	11.4

Excel formula:

=MEDIATION_ANALYSIS({"X","M","Y";1,2.2,2.3;2,2.8,3.1;3,3.9,4.3;4,5,5.2;5,5.9,6;6,7.1,7.4;7,8.2,8;8,8.8,9.2;9,9.9,10.1;10,11.1,11.4}, "X", "M", "Y", 0.05, 200, 42)

Expected output:

path	coef	se	pval	CI2.5	CI97.5	sig
M ~ X	1.00061	0.0177938	1.10999e-11	0.959573	1.04164	Yes
Y ~ M	0.998804	0.0208822	4.03763e-11	0.95065	1.04696	Yes
Total	1.00121	0.0174025	9.24967e-12	0.961082	1.04134	Yes
Direct	0.714205	0.354058	0.0834744	-0.123008	1.55142	No
Indirect	0.287007	0.521284	0.47	-0.47981	1.01255	No

Example 2: Single mediator with one covariate

Inputs:

data				x	m	y	covar	alpha	n_boot	seed
X	M	Y	C	X	M	Y	C	0.05	200	7
1	2.2	2.3	0
2	2.8	3.1	1
3	3.9	4.3	0
4	5	5.2	1
5	5.9	6	0
6	7.1	7.4	1
7	8.2	8	0
8	8.8	9.2	1
9	9.9	10.1	0
10	11.1	11.4	1

Excel formula:

=MEDIATION_ANALYSIS({"X","M","Y","C";1,2.2,2.3,0;2,2.8,3.1,1;3,3.9,4.3,0;4,5,5.2,1;5,5.9,6,0;6,7.1,7.4,1;7,8.2,8,0;8,8.8,9.2,1;9,9.9,10.1,0;10,11.1,11.4,1}, "X", "M", "Y", "C", 0.05, 200, 7)

Expected output:

path	coef	se	pval	CI2.5	CI97.5	sig
M ~ X	1.0025	0.0188746	2.19775e-10	0.957869	1.04713	Yes
Y ~ M	0.993524	0.0190439	2.49089e-10	0.948492	1.03856	Yes
Total	0.9975	0.017087	1.13571e-10	0.957096	1.0379	Yes
Direct	0.603033	0.334092	0.121113	-0.214461	1.42053	No
Indirect	0.394467	0.408463	0.36	-0.266288	1.49213	No

Example 3: Two parallel mediators using comma-separated names

Inputs:

data				x	m	y	alpha	n_boot	seed
X	M	Mtwo	Y	X	M, Mtwo	Y	0.05	150	5
1	2.2	0.3	2.3
2	2.8	0.4	3.1
3	3.9	0.8	4.3
4	5	1	5.2
5	5.9	1.4	6
6	7.1	1.8	7.4
7	8.2	2.2	8
8	8.8	2.4	9.2
9	9.9	2.7	10.1
10	11.1	3.2	11.4

Excel formula:

=MEDIATION_ANALYSIS({"X","M","Mtwo","Y";1,2.2,0.3,2.3;2,2.8,0.4,3.1;3,3.9,0.8,4.3;4,5,1,5.2;5,5.9,1.4,6;6,7.1,1.8,7.4;7,8.2,2.2,8;8,8.8,2.4,9.2;9,9.9,2.7,10.1;10,11.1,3.2,11.4}, "X", "M, Mtwo", "Y", 0.05, 150, 5)

Expected output:

path	coef	se	pval	CI2.5	CI97.5	sig
M ~ X	1.00061	0.0177938	1.10999e-11	0.959573	1.04164	Yes
Mtwo ~ X	0.328485	0.0105931	1.27149e-9	0.304057	0.352913	Yes
Y ~ M	0.99363	0.363678	0.029247	0.133667	1.85359	Yes
Y ~ Mtwo	0.0157464	1.10462	0.989024	-2.59628	2.62777	No
Total	1.00121	0.0174025	9.24967e-12	0.961082	1.04134	Yes
Direct	0.798207	0.389274	0.0861822	-0.154312	1.75073	No
Indirect M	-0.0106336	1.3718	0.946667	-3.21739	1.24809	No
Indirect Mtwo	0.213639	0.766531	0.426667	-0.191389	2.34114	No

Example 4: Different alpha and bootstrap size

Inputs:

data			x	m	y	alpha	n_boot	seed
X	M	Y	X	M	Y	0.1	120	11
1	2.2	2.3
2	2.8	3.1
3	3.9	4.3
4	5	5.2
5	5.9	6
6	7.1	7.4
7	8.2	8
8	8.8	9.2
9	9.9	10.1
10	11.1	11.4

Excel formula:

=MEDIATION_ANALYSIS({"X","M","Y";1,2.2,2.3;2,2.8,3.1;3,3.9,4.3;4,5,5.2;5,5.9,6;6,7.1,7.4;7,8.2,8;8,8.8,9.2;9,9.9,10.1;10,11.1,11.4}, "X", "M", "Y", 0.1, 120, 11)

Expected output:

path	coef	se	pval	CI5.0	CI95.0	sig
M ~ X	1.00061	0.0177938	1.10999e-11	0.967518	1.03369	Yes
Y ~ M	0.998804	0.0208822	4.03763e-11	0.959973	1.03764	Yes
Total	1.00121	0.0174025	9.24967e-12	0.968851	1.03357	Yes
Direct	0.714205	0.354058	0.0834744	0.0434153	1.385	Yes
Indirect	0.287007	0.405663	0.433333	-0.244521	1.01084	No

Python Code

external_packages = ['pingouin==0.6.0']

import pandas as pd
import pingouin as pg

def mediation_analysis(data, x, m, y, covar=None, alpha=0.05, n_boot=500, seed=None):
    """
    Perform causal mediation analysis with bootstrap confidence intervals.

    See: https://pingouin-stats.org/generated/pingouin.mediation_analysis.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): Input table as a 2D range with header row followed by observation rows.
        x (str): Predictor column name.
        m (str): Mediator column name, or comma-separated mediator column names for parallel mediation.
        y (str): Outcome column name.
        covar (str, optional): Optional covariate column name, or comma-separated covariate names. Default is None.
        alpha (float, optional): Significance level for confidence intervals (unitless). Default is 0.05.
        n_boot (int, optional): Number of bootstrap resamples used for indirect effect inference. Default is 500.
        seed (int, optional): Optional random seed for reproducible bootstrap sampling. Default is None.

    Returns:
        list[list]: Mediation summary table with path coefficients, uncertainty metrics, confidence intervals, p-values, and significance labels.
    """
    try:
        def to2d(x):
            return [[x]] if not isinstance(x, list) else x

        def parse_names(value):
            if value is None:
                return None
            text = str(value).strip()
            if text == "":
                return None
            names = [part.strip() for part in text.split(",") if part.strip()]
            if len(names) == 0:
                return None
            return names

        matrix = to2d(data)
        if not isinstance(matrix, list) or len(matrix) < 2 or not all(isinstance(row, list) for row in matrix):
            return "Error: Invalid input - data must be a 2D list with a header row and at least one data row"

        header = [str(col).strip() for col in matrix[0]]
        if len(header) == 0 or any(col == "" for col in header):
            return "Error: Invalid input - header row must contain non-empty column names"

        width = len(header)
        rows = []
        for row in matrix[1:]:
            row_values = list(row)
            if len(row_values) < width:
                row_values = row_values + [None] * (width - len(row_values))
            elif len(row_values) > width:
                row_values = row_values[:width]
            rows.append(row_values)

        if len(rows) == 0:
            return "Error: Invalid input - at least one observation row is required"

        frame = pd.DataFrame(rows, columns=header)

        m_list = parse_names(m)
        if m_list is None:
            return "Error: Invalid input - m must contain at least one mediator column name"
        m_arg = m_list[0] if len(m_list) == 1 else m_list

        covar_list = parse_names(covar)
        covar_arg = None if covar_list is None else (covar_list[0] if len(covar_list) == 1 else covar_list)

        seed_arg = None if seed is None else int(seed)

        stats = pg.mediation_analysis(
            data=frame,
            x=str(x),
            m=m_arg,
            y=str(y),
            covar=covar_arg,
            alpha=float(alpha),
            n_boot=int(n_boot),
            seed=seed_arg,
            return_dist=False,
        )

        columns = [str(col) for col in stats.columns.tolist()]
        output = [columns]
        for row in stats.itertuples(index=False, name=None):
            out_row = []
            for value in row:
                if pd.isna(value):
                    out_row.append("")
                else:
                    out_row.append(value)
            output.append(out_row)

        max_len = max(len(r) for r in output)
        return [r + [""] * (max_len - len(r)) for r in output]
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

data *

Input table as a 2D range with header row followed by observation rows.

x *

Predictor column name.

m *

Mediator column name, or comma-separated mediator column names for parallel mediation.

y *

Outcome column name.

covar

Optional covariate column name, or comma-separated covariate names.

alpha

Significance level for confidence intervals (unitless).

n_boot

Number of bootstrap resamples used for indirect effect inference.

seed

Optional random seed for reproducible bootstrap sampling.