Difference-in-differences
Estimates causal effects by comparing changes over time between a treated and a control group. Identifies the ATT under the parallel trends assumption — no randomization required.
THE ESTIMAND
DiD targets the average treatment effect on the treated (ATT) — the average effect among units that actually received treatment. It does not recover the ATE without additional assumptions.
THE LOGIC
Take the change in outcomes for treated units over time, subtract the change for control units over the same period. What remains is the treatment effect — provided both groups would have followed the same trend absent treatment.
THE 2×2 CASE
| Pre | Post | |
|---|---|---|
| Treated | Y¹₀ | Y¹₁ |
| Control | Y⁰₀ | Y⁰₁ |
| DiD estimate | (Y¹₁ − Y¹₀) − (Y⁰₁ − Y⁰₀) | |
CAUSAL STRUCTURE
In the absence of treatment, the average outcome for treated and control units would have followed the same trend over time. This is the core — and untestable in post-periods — assumption of DiD.
HOW TO TEST
Pre-period event study. Inspect coefficients on relative time dummies before treatment — they should be statistically and economically close to zero.
Units do not change behavior before treatment begins in anticipation of receiving it. Violated if firms or individuals respond to announced policies before they take effect.
HOW TO TEST
Pre-trend test at t−1, t−2. Statistically significant pre-period coefficients often signal anticipation effects.
Potential outcomes for unit i depend only on unit i's treatment status — no spillovers to other units, and only one version of treatment exists.
HOW TO TEST
Theoretical argument. Check for geographic spillovers by testing outcomes in border regions.
Both treated and control groups exist throughout the panel. Pure time-series units (always treated) cannot contribute to identification.
HOW TO TEST
Inspect treatment timing distribution. Ensure sufficient control units across all time periods.
Panel structure
Repeated observations of the same units over time. Minimum: two periods (pre and post). Unbalanced panels work but require care.
Treatment variation
Some units treated, others not — or units treated at different times. Cross-sectional variation in treatment timing is the source of identification.
Outcome variable
Observed for all units in all periods. Should be the same measure pre and post. Level or log depending on the estimand and interpretation.
library(tidyverse)
# Panel must have: unit id, time period, treatment indicator
panel <- read_csv("panel_data.csv") |>
mutate(
# Binary treatment: 1 if unit i is treated at time t
treated = as.integer(state_treated & year >= treat_year),
# Relative time: periods since/before treatment
rel_time = year - treat_year
)
# Check panel balance
panel |>
count(unit_id, year) |>
filter(n > 1) # should be emptyThe canonical DiD estimator adds unit and time fixed effects to a regression of the outcome on a treatment dummy. Unit FEs control for all time-invariant confounders; time FEs absorb common shocks. The coefficient on treated is the ATT — under parallel trends and homogeneous effects across cohorts.
library(fixest)
# Two-way fixed effects DiD
# Unit FE absorbs time-invariant differences
# Time FE absorbs common trends
fit <- feols(
outcome ~ treated | unit_id + year,
data = panel,
cluster = ~unit_id # cluster SEs at treatment level
)
summary(fit)
# coefplot(fit) # visual coefficient plotThe event-study specification replaces the single treatment dummy with a set of dummies for each period relative to treatment. Pre-treatment coefficients should be near zero — divergence suggests the parallel trends assumption fails before treatment begins.
Pre-trend F-test
Joint significance test on all pre-period coefficients. Rejection suggests pre-existing divergence, not just noise.
Sensitivity (Rambachan & Roth)
HonestDiD package tests how robust the ATT estimate is to bounded violations of parallel trends.
Placebo treatment
Randomly reassign treatment status and re-estimate. ATT should be near zero on average — a distributional check.
Alternate control group
Re-estimate using a different, arguably comparable, control group. Estimates should be similar if parallel trends holds broadly.
# Event-study specification: interact treatment with relative time
# ref = -1 sets the period before treatment as the baseline
fit_es <- feols(
outcome ~ i(rel_time, treated, ref = -1) | unit_id + year,
data = panel,
cluster = ~unit_id
)
# Plot — pre-period coefs should be near zero (parallel trends)
iplot(
fit_es,
main = "Event study: pre-trend test",
xlab = "Periods relative to treatment",
pt.join = TRUE
)
abline(h = 0, lty = 2, col = "grey60")When units adopt treatment at different times, TWFE conflates ATTs across cohorts and periods — and can produce sign-reversed estimates when treatment effects are heterogeneous. Callaway & Sant'Anna (2021) estimates group-time ATTs separately and aggregates cleanly.
What changes
Instead of one β, you get ATT(g,t): the average effect for cohort g (first treated in year g) at calendar time t.
Aggregation options
Aggregate to a single ATT, a dynamic event-study plot, or group-specific effects — all from the same underlying estimates.
Control group
Use never-treated units if available. If not, not-yet-treated units can serve as controls with additional assumptions.
Why not TWFE
Goodman-Bacon (2021) decomposition shows that TWFE weights cohort-time ATTs negatively when effect sizes vary across cohorts.
library(did)
# Callaway & Sant'Anna (2021)
# Robust to heterogeneous treatment effects across cohorts
cs <- att_gt(
yname = "outcome",
tname = "year",
idname = "unit_id",
gname = "first_treat_year", # 0 if never treated
control_group = "nevertreated",
data = panel
)
# Aggregate to overall ATT
aggte(cs, type = "simple")
# Or dynamic (event-study) aggregation
es <- aggte(cs, type = "dynamic")
ggdid(es)What does β = 0.04 mean?
A 4 percentage point increase in the outcome for treated units relative to controls, after accounting for unit and time fixed effects. Interpret in the units of your outcome variable.
My pre-period coefficients are non-zero — now what?
First check magnitude, not just significance. Small deviations with wide CIs may be noise. Large deviations suggest the parallel trends assumption fails — consider a different control group, covariates-adjusted DiD, or a different research design.
TWFE ATT differs from CS ATT — which do I report?
Report both and explain the difference. If treatment effects are homogeneous across cohorts, they should be close. Divergence is itself informative — it signals treatment effect heterogeneity across adoption cohorts.
Should I use clustered standard errors?
Yes — cluster at the level of treatment assignment (typically the state or firm). With fewer than ~30 clusters, consider wild cluster bootstrap or aggregation to the cluster level before estimation.