Event study
Estimates treatment effects at each period relative to adoption, producing a time-path of causal effects. Tests parallel trends in pre-periods and reveals whether effects are immediate, delayed, or fading.
An event study replaces the single DiD treatment dummy with a vector of dummies for each period relative to treatment — typically −k to +k, with the period before treatment (t=−1) omitted as the baseline.
DiD gives one average post-treatment effect. An event study gives a separate estimate for each post-period — revealing dynamics, anticipation, and fadeout — plus pre-period estimates that test the parallel trends assumption.
Always alongside a DiD. The event study is both a validity check (pre-trends) and a richer characterization of the treatment effect trajectory. A flat pre-period and rising post-period is the canonical clean result.
STYLIZED EVENT STUDY PLOT
Same as DiD: absent treatment, treated and control units would have followed the same outcome trajectory. The pre-period event-study coefficients are the primary empirical check.
HOW TO TEST
Inspect pre-period coefficients visually and via joint F-test. All should be statistically indistinguishable from zero.
Units do not adjust behavior before treatment begins. If they do, pre-period coefficients will be non-zero even if parallel trends holds — making it hard to separate anticipation from pre-trend failure.
HOW TO TEST
Check t = −2, −3 coefficients specifically. A pattern of rising pre-period estimates suggests anticipation rather than trend divergence.
The omitted period (usually t = −1) must be a clean pre-treatment period. If treatment effects begin before the nominal start (e.g., announcement effects), the baseline is contaminated.
HOW TO TEST
Try omitting t = −2 or t = −3 as the baseline. If coefficients shift substantially, the baseline choice is affecting estimates.
Early and late relative-time bins can be sparse if treatment timing varies widely. Bin the endpoints (e.g., cap at t = ±4) to avoid noisy estimates from few observations.
HOW TO TEST
Check observation counts per relative-time bin. Bins with fewer than ~20 units should be collapsed.
Relative time variable
A variable measuring periods since treatment for each unit. Requires knowing the treatment date for every treated unit — not just a post-period dummy.
Sufficient pre-periods
At least 2–3 pre-treatment periods to meaningfully test parallel trends. More is better — a flat pre-trend over 4+ periods is stronger evidence than 1–2 periods.
Treatment timing variation
For staggered designs, units must be treated at different times. Same-period treatment collapses to a standard DiD — no event-study dynamics are estimable.
library(tidyverse)
# Requires the same panel structure as DiD
# Key addition: relative time variable
panel <- read_csv("panel_data.csv") |>
mutate(
# Relative time: periods since treatment (negative = pre, positive = post)
rel_time = year - first_treat_year,
# Cap endpoints to avoid sparse bins
rel_time_binned = case_when(
rel_time <= -4 ~ -4L,
rel_time >= 4 ~ 4L,
TRUE ~ rel_time
)
)
# Check: distribution of relative time
count(panel, rel_time_binned)The event-study estimator adds relative-time × treatment interaction dummies to the TWFE regression, with t = −1 omitted as the baseline. Each coefficient gives the ATT at that horizon relative to the pre-treatment period.
library(fixest)
# Event-study specification
# i() creates relative-time × treatment interactions
# ref = -1 sets the period immediately before treatment as baseline
fit_es <- feols(
outcome ~ i(rel_time_binned, treated, ref = -1) | unit_id + year,
data = panel,
cluster = ~unit_id
)
# Coefficient plot with confidence intervals
iplot(
fit_es,
main = "Event study",
xlab = "Periods relative to treatment",
pt.join = TRUE,
ci.lwd = 1.5
)
abline(h = 0, lty = 2, col = "grey60")
abline(v = -0.5, lty = 3, col = "grey60") # treatment onsetVisual inspection
Plot all pre-period coefficients with 95% CIs. They should overlap zero and show no systematic trend. The eye test is often sufficient — a clearly sloping pre-period is a red flag regardless of p-values.
Joint F-test
Formally test H₀: all pre-period coefficients = 0. Low power with few pre-periods; a non-rejection is not strong evidence of parallel trends, just absence of detectable violation.
Rambachan & Roth (2023)
HonestDiD tests how robust post-period estimates are to bounded violations of parallel trends. Specifies the maximum allowable deviation M and reports sensitivity across values of M.
Placebo test
Assign a fake treatment date in the pre-period and estimate the event study. Pre-trend coefficients for this fake event should be flat — any pattern reveals pre-existing dynamics.
library(fixest)
# Joint F-test on all pre-period coefficients
# Null: all pre-period effects are zero (parallel trends holds)
pre_coefs <- names(coef(fit_es)) |>
((x) x[grepl("rel_time.*:-[1-9]", x)])()
# Wald test
wald(fit_es, pre_coefs)
# Sensitivity: Rambachan & Roth (2023)
# install.packages("HonestDiD")
library(HonestDiD)
honest_did <- createSensitivityResults(
betahat = coef(fit_es),
sigma = vcov(fit_es),
numPrePeriods = 3,
numPostPeriods = 4,
Mvec = seq(0, 0.05, by = 0.01)
)
createSensitivityPlot(honest_did)With staggered treatment timing, a standard TWFE event study conflates ATTs across cohorts and can produce distorted dynamics. Two robust alternatives are the Callaway–Sant'Anna dynamic aggregation and the Sun–Abraham interaction-weighted estimator.
Callaway–Sant'Anna (2021)
Estimates group-time ATTs separately and aggregates dynamically. Cleanly separates cohort effects from calendar-time effects. Preferred when treatment effect heterogeneity across cohorts is a concern.
Sun & Abraham (2021)
An interaction-weighted estimator implementable directly in fixest via sunab(). Produces cohort-robust event-study plots within the familiar TWFE framework. Faster to implement.
library(did)
# Callaway & Sant'Anna event-study aggregation
cs <- att_gt(
yname = "outcome",
tname = "year",
idname = "unit_id",
gname = "first_treat_year",
control_group = "nevertreated",
data = panel
)
# Dynamic aggregation = event-study plot
es_agg <- aggte(cs, type = "dynamic", min_e = -4, max_e = 4)
ggdid(es_agg, title = "CS event study: dynamic ATT")
# Sun & Abraham (2021) alternative via fixest
fit_sa <- feols(
outcome ~ sunab(first_treat_year, year) | unit_id + year,
data = panel,
cluster = ~unit_id
)
iplot(fit_sa)Pre-period coefficients are small but one is significant — should I be worried?
One marginally significant pre-period coefficient out of several is often noise. Evaluate jointly, not individually. A joint F-test matters more than any single period. Also check magnitude — a small but statistically significant pre-trend may be economically negligible.
My post-period effects are increasing — does that mean the treatment is strengthening?
Possibly, but rule out compositional changes first. Rising effects can also reflect selection — if marginal treated units are added over time, the composition of the treated group changes. Also check for Ashenfelter's dip in the pre-period, which can mechanically inflate post-period estimates.
The TWFE event study and the CS dynamic aggregation look different — which do I report?
Report both. Divergence between the two is informative — it signals treatment effect heterogeneity across adoption cohorts. The CS version is the more credible estimate. Explain the difference in your methods section.
How many pre-periods do I need?
At minimum two, but three or four provides much stronger evidence. With one pre-period you cannot distinguish a true parallel trend from a brief convergence. The longer the flat pre-period, the more credible the design.