DiD
Difference-in-differences
Estimand: ATTParallel trendsE[Y(0)ᵢₜ − Y(0)ᵢ,ₜ₋₁ | D=1] = E[Y(0)ᵢₜ − Y(0)ᵢ,ₜ₋₁ | D=0]partially testable
Absent treatment, treated and control groups would have followed the same outcome trend.
Diagnostic —Pre-period event study: check that pre-treatment coefficients are jointly zero.
No anticipationpartially testable
Units do not alter behavior before treatment in response to expected future treatment.
Diagnostic —Inspect leads in the event study — significant pre-trends may reflect anticipation.
SUTVAnot directly testable
No interference between units and a single version of treatment.
DiD
Staggered DiD (Callaway–Sant'Anna)
Estimand: ATT(g,t)Parallel trends (conditional)E[Y(0)ᵢₜ − Y(0)ᵢ,ₜ₋₁ | Gᵢ = g, X] = E[Y(0)ᵢₜ − Y(0)ᵢ,ₜ₋₁ | Cᵢ = 1, X]partially testable
For each cohort g, trends would have matched those of the not-yet-treated (or never-treated) units, conditional on covariates.
Diagnostic —Pre-treatment group-time ATTs should be zero; test jointly with aggte().
No anticipationpartially testable
Cohort g does not respond before period g.
Diagnostic —Inspect group-time ATT estimates in pre-treatment periods.
No forbidden comparisonstestable
Already-treated units are not used as the comparison group for later-treated cohorts.
Diagnostic —Ensured by design in CS estimator with 'notyet' or 'never' control group.
IV
Instrumental variables (2SLS)
Estimand: LATERelevanceCov(Z, D) ≠ 0testable
The instrument is correlated with treatment. Weak instruments inflate 2SLS bias toward OLS.
Diagnostic —First-stage F-statistic > 10 (Stock–Yogo). Report Cragg–Donald or Kleibergen–Paap F.
Exclusion restrictionCov(Z, ε) = 0not directly testable
The instrument affects the outcome only through treatment — no direct path from Z to Y.
Diagnostic —Argue from institutional knowledge. Overidentification test (Sargan–Hansen) if multiple instruments, but only tests relative exclusion.
IndependenceZ ⊥⊥ (Y(0), Y(1), D(0), D(1))partially testable
The instrument is as-good-as-randomly assigned, independent of potential outcomes and treatment compliance types.
Diagnostic —Balance test: instrument should be uncorrelated with pre-treatment covariates.
Monotonicitynot directly testable
No defiers — if Z moves D for any unit, it moves it in the same direction for all. Ensures complier pool is non-empty.
Diagnostic —Argue from the sign of the first stage. Partial tests available in multi-valued instruments.
RDD
Sharp RDD
Estimand: ATE (local)Continuity at thresholdE[Y(d) | X = c] continuous in X at c, for d ∈ {0,1}partially testable
Potential outcomes are smooth functions of the running variable at the cutoff. Any discontinuity in observed outcomes is attributed to treatment.
Diagnostic —Covariate smoothness test: pre-treatment covariates should not jump at the cutoff.
No manipulationtestable
Units cannot precisely sort above or below the cutoff. Assignment is effectively random in a neighborhood of c.
Diagnostic —McCrary (2008) density test — rddensity::rddensity() in R/Python.
Local covariate smoothnesstestable
Observed covariates are continuous at the cutoff, ruling out discontinuities that could confound the estimate.
Diagnostic —Run the main RDD specification with each covariate as the outcome. Coefficients should be near zero.
RDD
Fuzzy RDD
Estimand: LATE (local)Continuity at thresholdpartially testable
Same as sharp RDD — both potential outcomes continuous in the running variable at c.
Diagnostic —Covariate smoothness; visual inspection of outcome and forcing variable plots.
No manipulationtestable
Same as sharp RDD.
Diagnostic —McCrary density test.
Monotonicitypartially testable
Treatment probability weakly increases (or decreases) at the cutoff for all units — no defiers.
Diagnostic —First-stage estimate should be positive and significant.
Match
Matching & IPW
Estimand: ATT or ATEConditional ignorability (CIA)(Y(1), Y(0)) ⊥⊥ D | Xnot directly testable
All common causes of D and Y are observed in X. Unobserved confounding would invalidate the estimate.
Diagnostic —Rosenbaum bounds (rbounds) quantify how strong hidden bias would need to be to explain away the result.
Overlap / positivity0 < P(D = 1 | X) < 1 for all Xtestable
Every covariate profile has a nonzero probability of treatment and control. Propensity scores of 0 or 1 violate overlap.
Diagnostic —Inspect propensity score distributions for treated and control. Trim or discard units outside common support.
SUTVAnot directly testable
No interference between units; a single version of treatment.
SC
Synthetic control
Estimand: ATTPre-treatment fittestable
The synthetic control closely tracks the treated unit's pre-treatment outcome path. Poor fit implies poor counterfactual quality.
Diagnostic —Inspect MSPE of pre-treatment fit. Large MSPE relative to donor units casts doubt on the design.
Convex hulltestable
The treated unit's pre-treatment characteristics lie within the convex hull of the donor pool. Extrapolation produces negative weights.
Diagnostic —Check for negative synthetic weights. Abadie (2021) recommends restricting to non-negative weights.
No spillovers to donorspartially testable
Treatment of the target unit does not affect donor units' outcomes (SUTVA at the aggregate level).
Diagnostic —Exclude plausibly contaminated donors from the pool.
DML
Double ML
Estimand: ATE or ATTConditional ignorability(Y(1), Y(0)) ⊥⊥ D | Xnot directly testable
No unobserved confounders, given the high-dimensional covariate set X.
Overlap0 < P(D = 1 | X) < 1testable
Propensity scores bounded away from 0 and 1.
Diagnostic —Inspect cross-fitted propensity score distribution.
Neyman orthogonalitynot directly testable
The score function has zero Gateaux derivative with respect to nuisance parameters at the truth. Guarantees nuisance estimation errors don't contaminate the ATE at first order.
Diagnostic —Verified by construction for PLR and IRM scores — use the built-in DoubleML estimators.
Nuisance convergence raten⁻¹/⁴ for each nuisance modelpartially testable
Cross-fitted ML models for E[Y|X] and E[D|X] must converge fast enough. Most flexible learners (lasso, forests) satisfy this.
Diagnostic —Check that cross-fitted residuals have low correlation with each other and with the treatment.
CF
Causal forest
Estimand: CATEConditional ignorability(Y(1), Y(0)) ⊥⊥ D | Xnot directly testable
No unobserved confounders given X.
Overlapη < P(D = 1 | X) < 1 − ηtestable
Propensity bounded away from 0 and 1 uniformly in X.
Diagnostic —grf::get_forest_weights() or propensity model histogram.
Honestytestable
Each tree uses separate subsamples for splitting and for estimation. Required for valid confidence intervals on CATEs.
Diagnostic —Ensured by default in grf. Disable with honesty = FALSE only for exploratory work.
DR
AIPW / DR learner
Estimand: ATE or CATEConditional ignorabilitynot directly testable
No unobserved confounders given X.
Overlaptestable
Propensity bounded away from 0 and 1.
Diagnostic —Trim propensity scores at a small ε if needed.
Double robustnesspartially testable
Consistent if either the outcome model or the propensity model is correctly specified — not both required. Achieves semiparametric efficiency when both converge at n⁻¹/⁴.
Diagnostic —Cross-fit both nuisance models independently and check residual balance.