Regression discontinuity
Exploits sharp cutoffs in assignment rules to identify causal effects. Units just above and just below a threshold are treated as comparable, and the jump in outcomes at the cutoff estimates a local treatment effect.
A continuous variable that determines treatment. Units above a cutoff receive treatment; those below do not. Examples: exam scores, income relative to an eligibility threshold, age relative to a policy age.
Units just above and just below the cutoff are comparable in all other respects. Any jump in outcomes at the threshold is attributable to treatment — not to systematic differences between units near the cutoff.
The local average treatment effect at the cutoff — not the ATE. The effect may not generalize away from the threshold. Whether the LATE at the cutoff is policy-relevant depends on the research question.
SHARP RDD — VISUALIZED
Sharp RDD
Treatment is a deterministic function of the running variable — every unit above the cutoff is treated, every unit below is not. Identifies the ATE at the threshold directly.
Fuzzy RDD
Treatment probability jumps at the cutoff but compliance is imperfect — some units above don't take treatment, some below do. Equivalent to local IV: the cutoff instruments for actual take-up, recovering a LATE.
In the absence of treatment, potential outcomes are continuous through the cutoff. Equivalently, units cannot precisely sort themselves to just above or just below the threshold to receive or avoid treatment.
HOW TO TEST
McCrary density test (rddensity): checks whether the density of the running variable is continuous at the cutoff. A spike in density just above it suggests strategic sorting.
Units cannot precisely control their value of the running variable to fall on a preferred side of the cutoff. Fuzzy sorting (where units influence their value but not precisely) is less concerning than exact manipulation.
HOW TO TEST
Density test is the primary check. Also inspect the running variable histogram visually — a suspicious gap or spike near the cutoff is a red flag.
Pre-treatment covariates should not jump at the cutoff. If they do, the units just above and below differ in ways unrelated to treatment, invalidating the comparison.
HOW TO TEST
Estimate the RD on each pre-treatment covariate as a placebo outcome. Coefficients should be statistically and economically close to zero.
The local polynomial used to model outcomes on each side of the cutoff should fit the data well. Misspecification — e.g. forcing linearity on a curved relationship — can bias the discontinuity estimate.
HOW TO TEST
Try different polynomial orders (p=1, p=2). Check that estimates are stable across order. Prefer local linear (p=1) with optimal bandwidth to higher-order polynomials.
Running variable
A continuous variable with a known, externally determined cutoff. Must be observed for all units. Bunching at round numbers or the cutoff itself warrants scrutiny.
Sharp cutoff
Treatment assignment must switch at a specific value. The cutoff should be determined by a rule external to the units — not endogenously set by administrators with discretion.
Sufficient density near cutoff
Estimation is local to the threshold. Sufficient observations in the optimal bandwidth on both sides are needed for precise estimates. Thin tails near the cutoff inflate standard errors.
library(tidyverse)
data <- read_csv("rdd_data.csv")
# Center the running variable at the cutoff
# Makes threshold = 0, which is convention for rdrobust
cutoff <- 50 # e.g. an exam score, income threshold, age
data <- data |>
mutate(
x_centered = running_var - cutoff,
treated = as.integer(running_var >= cutoff)
)
# Visual check: raw outcome vs running variable
ggplot(data, aes(x = x_centered, y = outcome)) +
geom_point(alpha = 0.3, size = 0.8) +
geom_vline(xintercept = 0, linetype = "dashed", color = "grey60") +
labs(x = "Running variable (centered)", y = "Outcome")The standard estimator fits local linear regressions on each side of the cutoff using observations within an optimal bandwidth. The discontinuity estimate is the difference in fitted values at the cutoff. Bandwidth selection balances bias (wider bandwidth, more misfit) against variance (narrower bandwidth, fewer observations).
SHARP RDD
library(rdrobust)
# Sharp RDD: treatment deterministically assigned at cutoff
# rdrobust selects bandwidth optimally (MSE-optimal, CCT 2014)
fit_sharp <- rdrobust(
y = data$outcome,
x = data$x_centered,
c = 0, # cutoff (already centered)
kernel = "triangular", # triangular kernel downweights distant obs
bwselect = "mserd" # MSE-optimal bandwidth (default)
)
summary(fit_sharp)
# RD plot: binned scatter with fitted polynomials on each side
rdplot(
y = data$outcome,
x = data$x_centered,
c = 0,
title = "Sharp RDD",
x.label = "Running variable",
y.label = "Outcome"
)FUZZY RDD
library(rdrobust)
# Fuzzy RDD: treatment probability jumps at cutoff but compliance < 100%
# Equivalent to local IV: Z = 1(x >= c) instruments for actual treatment D
fit_fuzzy <- rdrobust(
y = data$outcome,
x = data$x_centered,
c = 0,
fuzzy = data$treatment, # actual (possibly incomplete) treatment take-up
kernel = "triangular",
bwselect = "mserd"
)
summary(fit_fuzzy)
# Coefficient is LATE at the threshold (Fuzzy RD = local IV)Density test
Tests for manipulation of the running variable at the cutoff. The rddensity package implements the Cattaneo-Jansson-Ma (2020) test, which is more powerful than the original McCrary (2008) test.
Covariate balance
Estimate the RD on each pre-treatment covariate. Estimates should be near zero — any jump indicates the local comparison is confounded by characteristics other than treatment.
Placebo cutoffs
Estimate the RD at artificial thresholds away from the true cutoff. Effects should be negligible at fake cutoffs, ruling out smooth underlying trends being mistaken for a discontinuity.
Bandwidth sensitivity
Re-estimate across a range of bandwidths. Estimates should be stable as the bandwidth varies — large swings suggest either misspecification or very limited local variation.
library(rdrobust)
library(rddensity)
# 1. Density test (McCrary / Cattaneo-Jansson-Ma)
# H0: density of running variable is continuous at cutoff
# Rejection = sorting / manipulation
dens_test <- rddensity(X = data$x_centered, c = 0)
summary(dens_test)
rdplotdensity(dens_test, data$x_centered)
# 2. Covariate balance (placebo outcomes)
# Estimate RD on pre-treatment covariates — should be near zero
cov_test <- rdrobust(y = data$covariate_1, x = data$x_centered, c = 0)
summary(cov_test)
# 3. Placebo cutoffs
# Estimate RD at artificial cutoffs away from the true one
for (c_fake in c(-20, -10, 10, 20)) {
fit_placebo <- rdrobust(
y = data$outcome[data$x_centered < 0], # control side only
x = data$x_centered[data$x_centered < 0],
c = c_fake
)
cat("Cutoff:", c_fake, "| Coef:", round(coef(fit_placebo), 3), "\n")
}
# 4. Bandwidth sensitivity
for (h in c(5, 10, 15, 20, 25)) {
fit_h <- rdrobust(y = data$outcome, x = data$x_centered, c = 0, h = h)
cat("Bandwidth:", h, "| Coef:", round(coef(fit_h), 3), "\n")
}My density test rejects — does that invalidate the design?
Not necessarily, but it warrants scrutiny. A rejection means the density of the running variable is not smooth at the cutoff, consistent with sorting. Investigate whether units could plausibly have manipulated their value. Donut-hole RDD — dropping observations very close to the cutoff — can sometimes recover validity if manipulation is confined to a narrow region.
Estimates change a lot with bandwidth — which do I report?
Report the MSE-optimal bandwidth from rdrobust as the primary estimate, then show a bandwidth sensitivity plot or table. Instability across bandwidths is itself informative — it suggests the local linear fit is sensitive to how far from the cutoff you look, which may indicate a nonlinear underlying relationship.
My covariate balance test shows a jump — now what?
A statistically significant jump in a pre-treatment covariate at the cutoff is a serious problem — it suggests the continuity assumption fails. Consider whether the covariate could be post-treatment (in which case it shouldn't be used as a balance check), whether the cutoff is determined endogenously, or whether a different comparison is needed.
How do I interpret the fuzzy RDD coefficient?
The fuzzy RDD estimator is equivalent to a local IV estimate: the ratio of the reduced-form effect (Y discontinuity) to the first-stage effect (D discontinuity). It recovers the LATE at the threshold for compliers — units whose treatment status changes because they cross the cutoff. Report first-stage strength (analogous to the IV F-statistic) alongside the fuzzy estimate.