Causal Methods
REFERENCE

Assumptions cheatsheet

Identifying assumptions for every method, with testability indicators and the standard diagnostic each assumption calls for.

TESTABILITY
testable
partially testable
not directly testable
DiD

Difference-in-differences

Estimand: ATT
Parallel trendsE[Y(0)ᵢₜ − Y(0)ᵢ,ₜ₋₁ | D=1] = E[Y(0)ᵢₜ − Y(0)ᵢ,ₜ₋₁ | D=0]partially testable

Absent treatment, treated and control groups would have followed the same outcome trend.

Diagnostic —Pre-period event study: check that pre-treatment coefficients are jointly zero.
No anticipationpartially testable

Units do not alter behavior before treatment in response to expected future treatment.

Diagnostic —Inspect leads in the event study — significant pre-trends may reflect anticipation.
SUTVAnot directly testable

No interference between units and a single version of treatment.

DiD

Staggered DiD (Callaway–Sant'Anna)

Estimand: ATT(g,t)
Parallel trends (conditional)E[Y(0)ᵢₜ − Y(0)ᵢ,ₜ₋₁ | Gᵢ = g, X] = E[Y(0)ᵢₜ − Y(0)ᵢ,ₜ₋₁ | Cᵢ = 1, X]partially testable

For each cohort g, trends would have matched those of the not-yet-treated (or never-treated) units, conditional on covariates.

Diagnostic —Pre-treatment group-time ATTs should be zero; test jointly with aggte().
No anticipationpartially testable

Cohort g does not respond before period g.

Diagnostic —Inspect group-time ATT estimates in pre-treatment periods.
No forbidden comparisonstestable

Already-treated units are not used as the comparison group for later-treated cohorts.

Diagnostic —Ensured by design in CS estimator with 'notyet' or 'never' control group.
IV

Instrumental variables (2SLS)

Estimand: LATE
RelevanceCov(Z, D) ≠ 0testable

The instrument is correlated with treatment. Weak instruments inflate 2SLS bias toward OLS.

Diagnostic —First-stage F-statistic > 10 (Stock–Yogo). Report Cragg–Donald or Kleibergen–Paap F.
Exclusion restrictionCov(Z, ε) = 0not directly testable

The instrument affects the outcome only through treatment — no direct path from Z to Y.

Diagnostic —Argue from institutional knowledge. Overidentification test (Sargan–Hansen) if multiple instruments, but only tests relative exclusion.
IndependenceZ ⊥⊥ (Y(0), Y(1), D(0), D(1))partially testable

The instrument is as-good-as-randomly assigned, independent of potential outcomes and treatment compliance types.

Diagnostic —Balance test: instrument should be uncorrelated with pre-treatment covariates.
Monotonicitynot directly testable

No defiers — if Z moves D for any unit, it moves it in the same direction for all. Ensures complier pool is non-empty.

Diagnostic —Argue from the sign of the first stage. Partial tests available in multi-valued instruments.
RDD

Sharp RDD

Estimand: ATE (local)
Continuity at thresholdE[Y(d) | X = c] continuous in X at c, for d ∈ {0,1}partially testable

Potential outcomes are smooth functions of the running variable at the cutoff. Any discontinuity in observed outcomes is attributed to treatment.

Diagnostic —Covariate smoothness test: pre-treatment covariates should not jump at the cutoff.
No manipulationtestable

Units cannot precisely sort above or below the cutoff. Assignment is effectively random in a neighborhood of c.

Diagnostic —McCrary (2008) density test — rddensity::rddensity() in R/Python.
Local covariate smoothnesstestable

Observed covariates are continuous at the cutoff, ruling out discontinuities that could confound the estimate.

Diagnostic —Run the main RDD specification with each covariate as the outcome. Coefficients should be near zero.
RDD

Fuzzy RDD

Estimand: LATE (local)
Continuity at thresholdpartially testable

Same as sharp RDD — both potential outcomes continuous in the running variable at c.

Diagnostic —Covariate smoothness; visual inspection of outcome and forcing variable plots.
No manipulationtestable

Same as sharp RDD.

Diagnostic —McCrary density test.
Monotonicitypartially testable

Treatment probability weakly increases (or decreases) at the cutoff for all units — no defiers.

Diagnostic —First-stage estimate should be positive and significant.
Match

Matching & IPW

Estimand: ATT or ATE
Conditional ignorability (CIA)(Y(1), Y(0)) ⊥⊥ D | Xnot directly testable

All common causes of D and Y are observed in X. Unobserved confounding would invalidate the estimate.

Diagnostic —Rosenbaum bounds (rbounds) quantify how strong hidden bias would need to be to explain away the result.
Overlap / positivity0 < P(D = 1 | X) < 1 for all Xtestable

Every covariate profile has a nonzero probability of treatment and control. Propensity scores of 0 or 1 violate overlap.

Diagnostic —Inspect propensity score distributions for treated and control. Trim or discard units outside common support.
SUTVAnot directly testable

No interference between units; a single version of treatment.

SC

Synthetic control

Estimand: ATT
Pre-treatment fittestable

The synthetic control closely tracks the treated unit's pre-treatment outcome path. Poor fit implies poor counterfactual quality.

Diagnostic —Inspect MSPE of pre-treatment fit. Large MSPE relative to donor units casts doubt on the design.
Convex hulltestable

The treated unit's pre-treatment characteristics lie within the convex hull of the donor pool. Extrapolation produces negative weights.

Diagnostic —Check for negative synthetic weights. Abadie (2021) recommends restricting to non-negative weights.
No spillovers to donorspartially testable

Treatment of the target unit does not affect donor units' outcomes (SUTVA at the aggregate level).

Diagnostic —Exclude plausibly contaminated donors from the pool.
DML

Double ML

Estimand: ATE or ATT
Conditional ignorability(Y(1), Y(0)) ⊥⊥ D | Xnot directly testable

No unobserved confounders, given the high-dimensional covariate set X.

Overlap0 < P(D = 1 | X) < 1testable

Propensity scores bounded away from 0 and 1.

Diagnostic —Inspect cross-fitted propensity score distribution.
Neyman orthogonalitynot directly testable

The score function has zero Gateaux derivative with respect to nuisance parameters at the truth. Guarantees nuisance estimation errors don't contaminate the ATE at first order.

Diagnostic —Verified by construction for PLR and IRM scores — use the built-in DoubleML estimators.
Nuisance convergence raten⁻¹/⁴ for each nuisance modelpartially testable

Cross-fitted ML models for E[Y|X] and E[D|X] must converge fast enough. Most flexible learners (lasso, forests) satisfy this.

Diagnostic —Check that cross-fitted residuals have low correlation with each other and with the treatment.
CF

Causal forest

Estimand: CATE
Conditional ignorability(Y(1), Y(0)) ⊥⊥ D | Xnot directly testable

No unobserved confounders given X.

Overlapη < P(D = 1 | X) < 1 − ηtestable

Propensity bounded away from 0 and 1 uniformly in X.

Diagnostic —grf::get_forest_weights() or propensity model histogram.
Honestytestable

Each tree uses separate subsamples for splitting and for estimation. Required for valid confidence intervals on CATEs.

Diagnostic —Ensured by default in grf. Disable with honesty = FALSE only for exploratory work.
DR

AIPW / DR learner

Estimand: ATE or CATE
Conditional ignorabilitynot directly testable

No unobserved confounders given X.

Overlaptestable

Propensity bounded away from 0 and 1.

Diagnostic —Trim propensity scores at a small ε if needed.
Double robustnesspartially testable

Consistent if either the outcome model or the propensity model is correctly specified — not both required. Achieves semiparametric efficiency when both converge at n⁻¹/⁴.

Diagnostic —Cross-fit both nuisance models independently and check residual balance.