Causal Methods
CHAPTER 04
RDD

Regression discontinuity

Exploits sharp cutoffs in assignment rules to identify causal effects. Units just above and just below a threshold are treated as comparable, and the jump in outcomes at the cutoff estimates a local treatment effect.

IDENTIFICATION SETUP
01The running variable

A continuous variable that determines treatment. Units above a cutoff receive treatment; those below do not. Examples: exam scores, income relative to an eligibility threshold, age relative to a policy age.

02The identifying assumption

Units just above and just below the cutoff are comparable in all other respects. Any jump in outcomes at the threshold is attributable to treatment — not to systematic differences between units near the cutoff.

03What it estimates

The local average treatment effect at the cutoff — not the ATE. The effect may not generalize away from the threshold. Whether the LATE at the cutoff is policy-relevant depends on the research question.

SHARP RDD — VISUALIZED

cutoffτ-30-1501530running variable (centered at cutoff)treatedcontrolcounterfactual
CutoffThe threshold where treatment assignment switches
τThe discontinuity — estimated treatment effect at the cutoff
CounterfactualWhere the control trend would have gone without the jump
BandwidthWindow around cutoff used for estimation — smaller = more local

Sharp RDD

Treatment is a deterministic function of the running variable — every unit above the cutoff is treated, every unit below is not. Identifies the ATE at the threshold directly.

Fuzzy RDD

Treatment probability jumps at the cutoff but compliance is imperfect — some units above don't take treatment, some below do. Equivalent to local IV: the cutoff instruments for actual take-up, recovering a LATE.

ASSUMPTIONS
Continuity at thresholdrequired

In the absence of treatment, potential outcomes are continuous through the cutoff. Equivalently, units cannot precisely sort themselves to just above or just below the threshold to receive or avoid treatment.

HOW TO TEST

McCrary density test (rddensity): checks whether the density of the running variable is continuous at the cutoff. A spike in density just above it suggests strategic sorting.

No manipulationrequired

Units cannot precisely control their value of the running variable to fall on a preferred side of the cutoff. Fuzzy sorting (where units influence their value but not precisely) is less concerning than exact manipulation.

HOW TO TEST

Density test is the primary check. Also inspect the running variable histogram visually — a suspicious gap or spike near the cutoff is a red flag.

Local continuity of covariatesrequired

Pre-treatment covariates should not jump at the cutoff. If they do, the units just above and below differ in ways unrelated to treatment, invalidating the comparison.

HOW TO TEST

Estimate the RD on each pre-treatment covariate as a placebo outcome. Coefficients should be statistically and economically close to zero.

Correct functional formrecommended

The local polynomial used to model outcomes on each side of the cutoff should fit the data well. Misspecification — e.g. forcing linearity on a curved relationship — can bias the discontinuity estimate.

HOW TO TEST

Try different polynomial orders (p=1, p=2). Check that estimates are stable across order. Prefer local linear (p=1) with optimal bandwidth to higher-order polynomials.

DATA REQUIREMENTS

Running variable

A continuous variable with a known, externally determined cutoff. Must be observed for all units. Bunching at round numbers or the cutoff itself warrants scrutiny.

Sharp cutoff

Treatment assignment must switch at a specific value. The cutoff should be determined by a rule external to the units — not endogenously set by administrators with discretion.

Sufficient density near cutoff

Estimation is local to the threshold. Sufficient observations in the optimal bandwidth on both sides are needed for precise estimates. Thin tails near the cutoff inflate standard errors.

01_data_prep.R
library(tidyverse)

data <- read_csv("rdd_data.csv")

# Center the running variable at the cutoff
# Makes threshold = 0, which is convention for rdrobust
cutoff <- 50   # e.g. an exam score, income threshold, age
data <- data |>
  mutate(
    x_centered = running_var - cutoff,
    treated = as.integer(running_var >= cutoff)
  )

# Visual check: raw outcome vs running variable
ggplot(data, aes(x = x_centered, y = outcome)) +
  geom_point(alpha = 0.3, size = 0.8) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "grey60") +
  labs(x = "Running variable (centered)", y = "Outcome")
ESTIMATION

The standard estimator fits local linear regressions on each side of the cutoff using observations within an optimal bandwidth. The discontinuity estimate is the difference in fitted values at the cutoff. Bandwidth selection balances bias (wider bandwidth, more misfit) against variance (narrower bandwidth, fewer observations).

Local linear estimator
Yi = α + τ · 1(Xi ≥ c) + β(Xi − c) + γ · 1(Xi ≥ c)(Xi − c) + εi
τ = discontinuity at cutoff c  ·  estimated within bandwidth h on each side

SHARP RDD

02_sharp_rdd.R
library(rdrobust)

# Sharp RDD: treatment deterministically assigned at cutoff
# rdrobust selects bandwidth optimally (MSE-optimal, CCT 2014)
fit_sharp <- rdrobust(
  y = data$outcome,
  x = data$x_centered,
  c = 0,              # cutoff (already centered)
  kernel = "triangular", # triangular kernel downweights distant obs
  bwselect = "mserd"     # MSE-optimal bandwidth (default)
)

summary(fit_sharp)

# RD plot: binned scatter with fitted polynomials on each side
rdplot(
  y = data$outcome,
  x = data$x_centered,
  c = 0,
  title = "Sharp RDD",
  x.label = "Running variable",
  y.label = "Outcome"
)

FUZZY RDD

03_fuzzy_rdd.R
library(rdrobust)

# Fuzzy RDD: treatment probability jumps at cutoff but compliance < 100%
# Equivalent to local IV: Z = 1(x >= c) instruments for actual treatment D
fit_fuzzy <- rdrobust(
  y = data$outcome,
  x = data$x_centered,
  c = 0,
  fuzzy = data$treatment,   # actual (possibly incomplete) treatment take-up
  kernel = "triangular",
  bwselect = "mserd"
)

summary(fit_fuzzy)
# Coefficient is LATE at the threshold (Fuzzy RD = local IV)
DIAGNOSTICS

Density test

Tests for manipulation of the running variable at the cutoff. The rddensity package implements the Cattaneo-Jansson-Ma (2020) test, which is more powerful than the original McCrary (2008) test.

Covariate balance

Estimate the RD on each pre-treatment covariate. Estimates should be near zero — any jump indicates the local comparison is confounded by characteristics other than treatment.

Placebo cutoffs

Estimate the RD at artificial thresholds away from the true cutoff. Effects should be negligible at fake cutoffs, ruling out smooth underlying trends being mistaken for a discontinuity.

Bandwidth sensitivity

Re-estimate across a range of bandwidths. Estimates should be stable as the bandwidth varies — large swings suggest either misspecification or very limited local variation.

04_diagnostics.R
library(rdrobust)
library(rddensity)

# 1. Density test (McCrary / Cattaneo-Jansson-Ma)
# H0: density of running variable is continuous at cutoff
# Rejection = sorting / manipulation
dens_test <- rddensity(X = data$x_centered, c = 0)
summary(dens_test)
rdplotdensity(dens_test, data$x_centered)

# 2. Covariate balance (placebo outcomes)
# Estimate RD on pre-treatment covariates — should be near zero
cov_test <- rdrobust(y = data$covariate_1, x = data$x_centered, c = 0)
summary(cov_test)

# 3. Placebo cutoffs
# Estimate RD at artificial cutoffs away from the true one
for (c_fake in c(-20, -10, 10, 20)) {
  fit_placebo <- rdrobust(
    y = data$outcome[data$x_centered < 0],  # control side only
    x = data$x_centered[data$x_centered < 0],
    c = c_fake
  )
  cat("Cutoff:", c_fake, "| Coef:", round(coef(fit_placebo), 3), "\n")
}

# 4. Bandwidth sensitivity
for (h in c(5, 10, 15, 20, 25)) {
  fit_h <- rdrobust(y = data$outcome, x = data$x_centered, c = 0, h = h)
  cat("Bandwidth:", h, "| Coef:", round(coef(fit_h), 3), "\n")
}
OUTPUT INTERPRETATION

My density test rejects — does that invalidate the design?

Not necessarily, but it warrants scrutiny. A rejection means the density of the running variable is not smooth at the cutoff, consistent with sorting. Investigate whether units could plausibly have manipulated their value. Donut-hole RDD — dropping observations very close to the cutoff — can sometimes recover validity if manipulation is confined to a narrow region.

Estimates change a lot with bandwidth — which do I report?

Report the MSE-optimal bandwidth from rdrobust as the primary estimate, then show a bandwidth sensitivity plot or table. Instability across bandwidths is itself informative — it suggests the local linear fit is sensitive to how far from the cutoff you look, which may indicate a nonlinear underlying relationship.

My covariate balance test shows a jump — now what?

A statistically significant jump in a pre-treatment covariate at the cutoff is a serious problem — it suggests the continuity assumption fails. Consider whether the covariate could be post-treatment (in which case it shouldn't be used as a balance check), whether the cutoff is determined endogenously, or whether a different comparison is needed.

How do I interpret the fuzzy RDD coefficient?

The fuzzy RDD estimator is equivalent to a local IV estimate: the ratio of the reduced-form effect (Y discontinuity) to the first-stage effect (D discontinuity). It recovers the LATE at the threshold for compliers — units whose treatment status changes because they cross the cutoff. Report first-stage strength (analogous to the IV F-statistic) alongside the fuzzy estimate.