Event study
Estimates treatment effects at each period relative to adoption, producing a time-path of causal effects. Tests parallel trends in pre-periods and reveals whether effects are immediate, delayed, or fading.
An event study replaces the single DiD treatment dummy with a vector of dummies for each period relative to treatment, typically −k to +k, with the period before treatment (t=−1) omitted as the baseline.
DiD gives one average post-treatment effect. An event study gives a separate estimate for each post-period, revealing dynamics, anticipation, and fadeout, plus pre-period estimates that test the parallel trends assumption.
Always alongside a DiD. The event study is both a validity check (pre-trends) and a richer characterization of the treatment effect trajectory. A flat pre-period and rising post-period is the canonical clean result.
STYLIZED EVENT STUDY PLOT
Same as DiD: absent treatment, treated and control units would have followed the same outcome trajectory. The pre-period event-study coefficients are the primary empirical check.
HOW TO TEST
Inspect pre-period coefficients visually and via joint F-test. All should be statistically indistinguishable from zero.
Units do not adjust behavior before treatment begins. If they do, pre-period coefficients will be non-zero even if parallel trends holds, making it hard to separate anticipation from pre-trend failure.
HOW TO TEST
Check t = −2, −3 coefficients specifically. A pattern of rising pre-period estimates suggests anticipation rather than trend divergence.
The omitted period (usually t = −1) must be a clean pre-treatment period. If treatment effects begin before the nominal start (e.g., announcement effects), the baseline is contaminated.
HOW TO TEST
Try omitting t = −2 or t = −3 as the baseline. If coefficients shift substantially, the baseline choice is affecting estimates.
Early and late relative-time bins can be sparse if treatment timing varies widely. Bin the endpoints (e.g., cap at t = ±4) to avoid noisy estimates from few observations.
HOW TO TEST
Check observation counts per relative-time bin. Bins with fewer than ~20 units should be collapsed.
Relative time variable
A variable measuring periods since treatment for each unit. Requires knowing the treatment date for every treated unit, not just a post-period dummy.
Sufficient pre-periods
At least 2–3 pre-treatment periods to meaningfully test parallel trends. More is better, a flat pre-trend over 4+ periods is stronger evidence than 1–2 periods.
Treatment timing variation
For staggered designs, units must be treated at different times. Same-period treatment collapses to a standard DiD, no event-study dynamics are estimable.
library(tidyverse)
# Requires the same panel structure as DiD
# Key addition: relative time variable
panel <- read_csv("panel_data.csv") |>
mutate(
# Relative time: periods since treatment (negative = pre, positive = post)
rel_time = year - first_treat_year,
# Cap endpoints to avoid sparse bins
rel_time_binned = case_when(
rel_time <= -4 ~ -4L,
rel_time >= 4 ~ 4L,
TRUE ~ rel_time
)
)
# Check: distribution of relative time
count(panel, rel_time_binned)The event-study estimator adds relative-time × treatment interaction dummies to the TWFE regression, with t = −1 omitted as the baseline. Each coefficient gives the ATT at that horizon relative to the pre-treatment period.
library(fixest)
# Event-study specification
# i() creates relative-time × treatment interactions
# ref = -1 sets the period immediately before treatment as baseline
fit_es <- feols(
outcome ~ i(rel_time_binned, treated, ref = -1) | unit_id + year,
data = panel,
cluster = ~unit_id
)
# Coefficient plot with confidence intervals
iplot(
fit_es,
main = "Event study",
xlab = "Periods relative to treatment",
pt.join = TRUE,
ci.lwd = 1.5
)
abline(h = 0, lty = 2, col = "grey60")
abline(v = -0.5, lty = 3, col = "grey60") # treatment onsetVisual inspection
Plot all pre-period coefficients with 95% CIs. They should overlap zero and show no systematic trend. The eye test is often sufficient, a clearly sloping pre-period is a red flag regardless of p-values.
Joint F-test
Formally test H₀: all pre-period coefficients = 0. Low power with few pre-periods; a non-rejection is not strong evidence of parallel trends, just absence of detectable violation.
Rambachan & Roth (2023)
HonestDiD tests how robust post-period estimates are to bounded violations of parallel trends. Specifies the maximum allowable deviation M and reports sensitivity across values of M.
Placebo test
Assign a fake treatment date in the pre-period and estimate the event study. Pre-trend coefficients for this fake event should be flat, any pattern reveals pre-existing dynamics.
library(fixest)
# Joint F-test on all pre-period coefficients
# Null: all pre-period effects are zero (parallel trends holds)
pre_coefs <- names(coef(fit_es)) |>
((x) x[grepl("rel_time.*:-[1-9]", x)])()
# Wald test
wald(fit_es, pre_coefs)
# Sensitivity: Rambachan & Roth (2023)
# install.packages("HonestDiD")
library(HonestDiD)
honest_did <- createSensitivityResults(
betahat = coef(fit_es),
sigma = vcov(fit_es),
numPrePeriods = 3,
numPostPeriods = 4,
Mvec = seq(0, 0.05, by = 0.01)
)
createSensitivityPlot(honest_did)With staggered treatment timing, a standard TWFE event study conflates ATTs across cohorts and can produce distorted dynamics. Two robust alternatives are the Callaway–Sant'Anna dynamic aggregation and the Sun–Abraham interaction-weighted estimator.
Callaway–Sant'Anna (2021)
Estimates group-time ATTs separately and aggregates dynamically. Cleanly separates cohort effects from calendar-time effects. Preferred when treatment effect heterogeneity across cohorts is a concern.
Sun & Abraham (2021)
An interaction-weighted estimator implementable directly in fixest via sunab(). Produces cohort-robust event-study plots within the familiar TWFE framework. Faster to implement.
library(did)
# Callaway & Sant'Anna event-study aggregation
cs <- att_gt(
yname = "outcome",
tname = "year",
idname = "unit_id",
gname = "first_treat_year",
control_group = "nevertreated",
data = panel
)
# Dynamic aggregation = event-study plot
es_agg <- aggte(cs, type = "dynamic", min_e = -4, max_e = 4)
ggdid(es_agg, title = "CS event study: dynamic ATT")
# Sun & Abraham (2021) alternative via fixest
fit_sa <- feols(
outcome ~ sunab(first_treat_year, year) | unit_id + year,
data = panel,
cluster = ~unit_id
)
iplot(fit_sa)Pre-period coefficients are small but one is significant, should I be worried?
One marginally significant pre-period coefficient out of several is often noise. Evaluate jointly, not individually. A joint F-test matters more than any single period. Also check magnitude, a small but statistically significant pre-trend may be economically negligible.
My post-period effects are increasing, does that mean the treatment is strengthening?
Possibly, but rule out compositional changes first. Rising effects can also reflect selection, if marginal treated units are added over time, the composition of the treated group changes. Also check for Ashenfelter's dip in the pre-period, which can mechanically inflate post-period estimates.
The TWFE event study and the CS dynamic aggregation look different, which do I report?
Report both. Divergence between the two is informative, it signals treatment effect heterogeneity across adoption cohorts. The CS version is the more credible estimate. Explain the difference in your methods section.
How many pre-periods do I need?
At minimum two, but three or four provides much stronger evidence. With one pre-period you cannot distinguish a true parallel trend from a brief convergence. The longer the flat pre-period, the more credible the design.