Causal Methods
CHAPTER 01
DiD

Difference-in-differences

Estimates causal effects by comparing changes over time between a treated and a control group. Identifies the ATT under the parallel trends assumption — no randomization required.

IDENTIFICATION SETUP

THE ESTIMAND

DiD targets the average treatment effect on the treated (ATT) — the average effect among units that actually received treatment. It does not recover the ATE without additional assumptions.

THE LOGIC

Take the change in outcomes for treated units over time, subtract the change for control units over the same period. What remains is the treatment effect — provided both groups would have followed the same trend absent treatment.

THE 2×2 CASE

PrePost
TreatedY¹₀Y¹₁
ControlY⁰₀Y⁰₁
DiD estimate(Y¹₁ − Y¹₀) − (Y⁰₁ − Y⁰₀)

CAUSAL STRUCTURE

DTreatmentYOutcomeUUnit FE (absorbs)λtTime FEPTParallel trendsATT
D → YCausal effect of interest (ATT)
U → D,YUnit-level confounders — absorbed by unit FE
λt → YCommon time shocks — absorbed by time FE
PTParallel trends assumption — must hold
ASSUMPTIONS
Parallel trendsrequired

In the absence of treatment, the average outcome for treated and control units would have followed the same trend over time. This is the core — and untestable in post-periods — assumption of DiD.

HOW TO TEST

Pre-period event study. Inspect coefficients on relative time dummies before treatment — they should be statistically and economically close to zero.

No anticipationrequired

Units do not change behavior before treatment begins in anticipation of receiving it. Violated if firms or individuals respond to announced policies before they take effect.

HOW TO TEST

Pre-trend test at t−1, t−2. Statistically significant pre-period coefficients often signal anticipation effects.

SUTVArequired

Potential outcomes for unit i depend only on unit i's treatment status — no spillovers to other units, and only one version of treatment exists.

HOW TO TEST

Theoretical argument. Check for geographic spillovers by testing outcomes in border regions.

Overlaprecommended

Both treated and control groups exist throughout the panel. Pure time-series units (always treated) cannot contribute to identification.

HOW TO TEST

Inspect treatment timing distribution. Ensure sufficient control units across all time periods.

DATA REQUIREMENTS

Panel structure

Repeated observations of the same units over time. Minimum: two periods (pre and post). Unbalanced panels work but require care.

Treatment variation

Some units treated, others not — or units treated at different times. Cross-sectional variation in treatment timing is the source of identification.

Outcome variable

Observed for all units in all periods. Should be the same measure pre and post. Level or log depending on the estimand and interpretation.

01_data_prep.R
library(tidyverse)

# Panel must have: unit id, time period, treatment indicator
panel <- read_csv("panel_data.csv") |>
  mutate(
    # Binary treatment: 1 if unit i is treated at time t
    treated = as.integer(state_treated & year >= treat_year),
    # Relative time: periods since/before treatment
    rel_time = year - treat_year
  )

# Check panel balance
panel |>
  count(unit_id, year) |>
  filter(n > 1)           # should be empty
TWFE ESTIMATOR

The canonical DiD estimator adds unit and time fixed effects to a regression of the outcome on a treatment dummy. Unit FEs control for all time-invariant confounders; time FEs absorb common shocks. The coefficient on treated is the ATT — under parallel trends and homogeneous effects across cohorts.

Yit = αi + λt + β · Dit + εitβ = ATT (under parallel trends)
02_twfe.R
library(fixest)

# Two-way fixed effects DiD
# Unit FE absorbs time-invariant differences
# Time FE absorbs common trends
fit <- feols(
  outcome ~ treated | unit_id + year,
  data = panel,
  cluster = ~unit_id          # cluster SEs at treatment level
)

summary(fit)
# coefplot(fit)               # visual coefficient plot
PRE-TREND TESTING & DIAGNOSTICS

The event-study specification replaces the single treatment dummy with a set of dummies for each period relative to treatment. Pre-treatment coefficients should be near zero — divergence suggests the parallel trends assumption fails before treatment begins.

Pre-trend F-test

Joint significance test on all pre-period coefficients. Rejection suggests pre-existing divergence, not just noise.

Sensitivity (Rambachan & Roth)

HonestDiD package tests how robust the ATT estimate is to bounded violations of parallel trends.

Placebo treatment

Randomly reassign treatment status and re-estimate. ATT should be near zero on average — a distributional check.

Alternate control group

Re-estimate using a different, arguably comparable, control group. Estimates should be similar if parallel trends holds broadly.

03_pretrend.R
# Event-study specification: interact treatment with relative time
# ref = -1 sets the period before treatment as the baseline
fit_es <- feols(
  outcome ~ i(rel_time, treated, ref = -1) | unit_id + year,
  data = panel,
  cluster = ~unit_id
)

# Plot — pre-period coefs should be near zero (parallel trends)
iplot(
  fit_es,
  main = "Event study: pre-trend test",
  xlab = "Periods relative to treatment",
  pt.join = TRUE
)
abline(h = 0, lty = 2, col = "grey60")
STAGGERED ADOPTION — CALLAWAY & SANT'ANNA

When units adopt treatment at different times, TWFE conflates ATTs across cohorts and periods — and can produce sign-reversed estimates when treatment effects are heterogeneous. Callaway & Sant'Anna (2021) estimates group-time ATTs separately and aggregates cleanly.

What changes

Instead of one β, you get ATT(g,t): the average effect for cohort g (first treated in year g) at calendar time t.

Aggregation options

Aggregate to a single ATT, a dynamic event-study plot, or group-specific effects — all from the same underlying estimates.

Control group

Use never-treated units if available. If not, not-yet-treated units can serve as controls with additional assumptions.

Why not TWFE

Goodman-Bacon (2021) decomposition shows that TWFE weights cohort-time ATTs negatively when effect sizes vary across cohorts.

04_callaway_santanna.R
library(did)

# Callaway & Sant'Anna (2021)
# Robust to heterogeneous treatment effects across cohorts
cs <- att_gt(
  yname = "outcome",
  tname = "year",
  idname = "unit_id",
  gname = "first_treat_year",   # 0 if never treated
  control_group = "nevertreated",
  data = panel
)

# Aggregate to overall ATT
aggte(cs, type = "simple")

# Or dynamic (event-study) aggregation
es <- aggte(cs, type = "dynamic")
ggdid(es)
OUTPUT INTERPRETATION

What does β = 0.04 mean?

A 4 percentage point increase in the outcome for treated units relative to controls, after accounting for unit and time fixed effects. Interpret in the units of your outcome variable.

My pre-period coefficients are non-zero — now what?

First check magnitude, not just significance. Small deviations with wide CIs may be noise. Large deviations suggest the parallel trends assumption fails — consider a different control group, covariates-adjusted DiD, or a different research design.

TWFE ATT differs from CS ATT — which do I report?

Report both and explain the difference. If treatment effects are homogeneous across cohorts, they should be close. Divergence is itself informative — it signals treatment effect heterogeneity across adoption cohorts.

Should I use clustered standard errors?

Yes — cluster at the level of treatment assignment (typically the state or firm). With fewer than ~30 clusters, consider wild cluster bootstrap or aggregation to the cluster level before estimation.