Causal Methods
CHAPTER 02
ES

Event study

Estimates treatment effects at each period relative to adoption, producing a time-path of causal effects. Tests parallel trends in pre-periods and reveals whether effects are immediate, delayed, or fading.

IDENTIFICATION SETUP
01What it is

An event study replaces the single DiD treatment dummy with a vector of dummies for each period relative to treatment — typically −k to +k, with the period before treatment (t=−1) omitted as the baseline.

02What it adds over DiD

DiD gives one average post-treatment effect. An event study gives a separate estimate for each post-period — revealing dynamics, anticipation, and fadeout — plus pre-period estimates that test the parallel trends assumption.

03When to use it

Always alongside a DiD. The event study is both a validity check (pre-trends) and a richer characterization of the treatment effect trajectory. A flat pre-period and rising post-period is the canonical clean result.

STYLIZED EVENT STUDY PLOT

treatmentref-4-3-2-101234periods relative to treatment
t < 0Pre-period — should be near zero if parallel trends holds
t = −1Omitted baseline — all effects measured relative to this period
t ≥ 0Post-period — treatment effect at each horizon
ShadedPre-treatment window — check for anticipation or divergence
ASSUMPTIONS
Parallel trendsrequired

Same as DiD: absent treatment, treated and control units would have followed the same outcome trajectory. The pre-period event-study coefficients are the primary empirical check.

HOW TO TEST

Inspect pre-period coefficients visually and via joint F-test. All should be statistically indistinguishable from zero.

No anticipationrequired

Units do not adjust behavior before treatment begins. If they do, pre-period coefficients will be non-zero even if parallel trends holds — making it hard to separate anticipation from pre-trend failure.

HOW TO TEST

Check t = −2, −3 coefficients specifically. A pattern of rising pre-period estimates suggests anticipation rather than trend divergence.

Correct baseline periodrequired

The omitted period (usually t = −1) must be a clean pre-treatment period. If treatment effects begin before the nominal start (e.g., announcement effects), the baseline is contaminated.

HOW TO TEST

Try omitting t = −2 or t = −3 as the baseline. If coefficients shift substantially, the baseline choice is affecting estimates.

Sufficient support at endpointsrecommended

Early and late relative-time bins can be sparse if treatment timing varies widely. Bin the endpoints (e.g., cap at t = ±4) to avoid noisy estimates from few observations.

HOW TO TEST

Check observation counts per relative-time bin. Bins with fewer than ~20 units should be collapsed.

DATA REQUIREMENTS

Relative time variable

A variable measuring periods since treatment for each unit. Requires knowing the treatment date for every treated unit — not just a post-period dummy.

Sufficient pre-periods

At least 2–3 pre-treatment periods to meaningfully test parallel trends. More is better — a flat pre-trend over 4+ periods is stronger evidence than 1–2 periods.

Treatment timing variation

For staggered designs, units must be treated at different times. Same-period treatment collapses to a standard DiD — no event-study dynamics are estimable.

01_data_prep.R
library(tidyverse)

# Requires the same panel structure as DiD
# Key addition: relative time variable
panel <- read_csv("panel_data.csv") |>
  mutate(
    # Relative time: periods since treatment (negative = pre, positive = post)
    rel_time = year - first_treat_year,
    # Cap endpoints to avoid sparse bins
    rel_time_binned = case_when(
      rel_time <= -4 ~ -4L,
      rel_time >=  4 ~  4L,
      TRUE           ~ rel_time
    )
  )

# Check: distribution of relative time
count(panel, rel_time_binned)
SPECIFICATION & ESTIMATION

The event-study estimator adds relative-time × treatment interaction dummies to the TWFE regression, with t = −1 omitted as the baseline. Each coefficient gives the ATT at that horizon relative to the pre-treatment period.

Yit = αi + λt + Σk≠−1 βk · 𝟙[Kit = k] + εitβk = ATT at horizon k
02_event_study.R
library(fixest)

# Event-study specification
# i() creates relative-time × treatment interactions
# ref = -1 sets the period immediately before treatment as baseline
fit_es <- feols(
  outcome ~ i(rel_time_binned, treated, ref = -1) | unit_id + year,
  data = panel,
  cluster = ~unit_id
)

# Coefficient plot with confidence intervals
iplot(
  fit_es,
  main = "Event study",
  xlab = "Periods relative to treatment",
  pt.join = TRUE,
  ci.lwd = 1.5
)
abline(h = 0, lty = 2, col = "grey60")
abline(v = -0.5, lty = 3, col = "grey60")  # treatment onset
PRE-TREND TESTING & DIAGNOSTICS

Visual inspection

Plot all pre-period coefficients with 95% CIs. They should overlap zero and show no systematic trend. The eye test is often sufficient — a clearly sloping pre-period is a red flag regardless of p-values.

Joint F-test

Formally test H₀: all pre-period coefficients = 0. Low power with few pre-periods; a non-rejection is not strong evidence of parallel trends, just absence of detectable violation.

Rambachan & Roth (2023)

HonestDiD tests how robust post-period estimates are to bounded violations of parallel trends. Specifies the maximum allowable deviation M and reports sensitivity across values of M.

Placebo test

Assign a fake treatment date in the pre-period and estimate the event study. Pre-trend coefficients for this fake event should be flat — any pattern reveals pre-existing dynamics.

03_pretrend_test.R
library(fixest)

# Joint F-test on all pre-period coefficients
# Null: all pre-period effects are zero (parallel trends holds)
pre_coefs <- names(coef(fit_es)) |>
  ((x) x[grepl("rel_time.*:-[1-9]", x)])()

# Wald test
wald(fit_es, pre_coefs)

# Sensitivity: Rambachan & Roth (2023)
# install.packages("HonestDiD")
library(HonestDiD)

honest_did <- createSensitivityResults(
  betahat = coef(fit_es),
  sigma = vcov(fit_es),
  numPrePeriods = 3,
  numPostPeriods = 4,
  Mvec = seq(0, 0.05, by = 0.01)
)
createSensitivityPlot(honest_did)
STAGGERED ADOPTION

With staggered treatment timing, a standard TWFE event study conflates ATTs across cohorts and can produce distorted dynamics. Two robust alternatives are the Callaway–Sant'Anna dynamic aggregation and the Sun–Abraham interaction-weighted estimator.

Callaway–Sant'Anna (2021)

Estimates group-time ATTs separately and aggregates dynamically. Cleanly separates cohort effects from calendar-time effects. Preferred when treatment effect heterogeneity across cohorts is a concern.

Sun & Abraham (2021)

An interaction-weighted estimator implementable directly in fixest via sunab(). Produces cohort-robust event-study plots within the familiar TWFE framework. Faster to implement.

04_staggered_es.R
library(did)

# Callaway & Sant'Anna event-study aggregation
cs <- att_gt(
  yname = "outcome",
  tname = "year",
  idname = "unit_id",
  gname = "first_treat_year",
  control_group = "nevertreated",
  data = panel
)

# Dynamic aggregation = event-study plot
es_agg <- aggte(cs, type = "dynamic", min_e = -4, max_e = 4)
ggdid(es_agg, title = "CS event study: dynamic ATT")

# Sun & Abraham (2021) alternative via fixest
fit_sa <- feols(
  outcome ~ sunab(first_treat_year, year) | unit_id + year,
  data = panel,
  cluster = ~unit_id
)
iplot(fit_sa)
OUTPUT INTERPRETATION

Pre-period coefficients are small but one is significant — should I be worried?

One marginally significant pre-period coefficient out of several is often noise. Evaluate jointly, not individually. A joint F-test matters more than any single period. Also check magnitude — a small but statistically significant pre-trend may be economically negligible.

My post-period effects are increasing — does that mean the treatment is strengthening?

Possibly, but rule out compositional changes first. Rising effects can also reflect selection — if marginal treated units are added over time, the composition of the treated group changes. Also check for Ashenfelter's dip in the pre-period, which can mechanically inflate post-period estimates.

The TWFE event study and the CS dynamic aggregation look different — which do I report?

Report both. Divergence between the two is informative — it signals treatment effect heterogeneity across adoption cohorts. The CS version is the more credible estimate. Explain the difference in your methods section.

How many pre-periods do I need?

At minimum two, but three or four provides much stronger evidence. With one pre-period you cannot distinguish a true parallel trend from a brief convergence. The longer the flat pre-period, the more credible the design.