Instrumental variables
Uses a third variable — the instrument — that shifts treatment but has no direct effect on the outcome. Identifies the local average treatment effect (LATE) for compliers when unobserved confounding rules out DiD or matching.
THE PROBLEM IV SOLVES
When treatment is correlated with unobserved factors that also affect the outcome, OLS is biased and DiD or matching cannot help — there is no observable variable to condition on that closes the backdoor path. IV finds a source of variation in treatment that is unrelated to the confounder.
THE ESTIMAND: LATE
IV does not recover the ATE or ATT. It identifies the local average treatment effect (LATE) — the average effect among compliers: units whose treatment status changes because of the instrument. Always-takers and never-takers contribute no identifying variation.
THE LOGIC: TWO STAGES
First stage: regress treatment on the instrument. This isolates the variation in treatment that comes only from Z.
Second stage: regress the outcome on the fitted values from stage one. The coefficient is the LATE — the effect of treatment-variation-due-to-Z on the outcome.
CAUSAL STRUCTURE
The instrument must have a meaningful, non-trivial correlation with the treatment variable. Weak instruments — those with only a small effect on treatment — produce biased and highly variable IV estimates.
HOW TO TEST
First-stage F-statistic. Staiger & Stock (1997) suggest F > 10 as a minimum. Stock & Yogo (2005) recommend F > 104 for a single instrument at 5% maximal IV size bias.
The instrument affects the outcome only through its effect on treatment — there is no direct path from Z to Y. This is a theoretical assumption and cannot be formally tested. It must be argued on substantive grounds.
HOW TO TEST
Theoretical argument and domain knowledge. With multiple instruments, the Sargan-Hansen overidentification test can check internal consistency — but only relative to other instruments, not in absolute terms.
The instrument is independent of the unobserved confounder — as good as randomly assigned. In natural experiments this follows from the design; in other settings it requires a strong institutional argument.
HOW TO TEST
Covariate balance: regress Z on pre-treatment covariates. Significant correlations suggest the independence assumption may be violated.
The instrument moves treatment in the same direction for all units — no defiers. Required for the LATE interpretation. Violated if some units take treatment when Z = 0 but not when Z = 1.
HOW TO TEST
Theoretical argument. There is no direct empirical test, but the complier share (first-stage coefficient) should be bounded in [0, 1].
A valid instrument
A variable correlated with treatment but with no direct effect on the outcome and independent of confounders. Typically comes from a natural experiment, policy rule, or institutional feature.
Treatment and outcome
Can be cross-sectional or panel. IV is especially valuable when the treatment variable is endogenous — selected by units based on characteristics also affecting the outcome.
Covariates
Pre-treatment variables that may confound the instrument-treatment relationship. Including them in both stages improves efficiency and helps satisfy the independence assumption conditional on X.
library(tidyverse)
data <- read_csv("iv_data.csv")
# Check instrument relevance visually before estimation
# A strong first stage is prerequisite — not optional
ggplot(data, aes(x = instrument, y = treatment)) +
geom_point(alpha = 0.3, size = 1) +
geom_smooth(method = "lm", se = TRUE) +
labs(
title = "First stage: instrument vs treatment",
x = "Instrument (Z)",
y = "Treatment (D)"
)
# Check instrument balance across covariates
# Z should be uncorrelated with pre-treatment characteristics
lm(instrument ~ covar_1 + covar_2 + covar_3, data = data) |>
summary()Two-stage least squares is the standard IV estimator. In the first stage, treatment is regressed on the instrument and covariates. In the second stage, the outcome is regressed on the fitted treatment values. Never run the two stages manually — standard errors from a manual second stage are incorrect. Use a dedicated IV estimator.
library(fixest)
# Two-stage least squares via feols
# Formula: outcome ~ covariates | fixed effects | instrument ~ endogenous
fit_iv <- feols(
outcome ~ covar_1 + covar_2 | 0 | treatment ~ instrument,
data = data,
cluster = ~group_id
)
summary(fit_iv)
# First-stage F-statistic (rule of thumb: > 10, prefer > 104 for Stock-Yogo)
fitstat(fit_iv, "ivf")
# Wu-Hausman endogeneity test
fitstat(fit_iv, "wh")First-stage F-statistic
Tests instrument relevance. The most important diagnostic — a weak instrument causes severe bias toward OLS. Report alongside the IV estimate. F > 10 is a commonly cited floor, but Stock-Yogo critical values are preferred.
Anderson-Rubin test
A weak-instrument-robust test of the null that β = 0. Unlike the standard Wald test, AR inference is valid even with a weak instrument. Report AR confidence intervals when the first-stage F is borderline.
Sargan-Hansen test
Tests overidentifying restrictions when you have more instruments than endogenous variables. A rejection suggests at least one instrument violates the exclusion restriction — but the test cannot identify which one.
Wu-Hausman test
Tests whether OLS and IV estimates are statistically different. If treatment is exogenous (OLS consistent), IV is less efficient than OLS. A rejection justifies IV. Non-rejection does not prove exogeneity.
library(fixest)
library(lmtest)
# 1. First-stage F-statistic
# Staiger & Stock (1997): F > 10 as minimum
# Stock & Yogo (2005): F > 104 for 5% maximal IV size with one instrument
fitstat(fit_iv, "ivf")
# 2. Weak instrument robust inference (Anderson-Rubin)
# Valid regardless of instrument strength
fitstat(fit_iv, "ar")
# 3. Overidentification test (if multiple instruments)
# Sargan-Hansen: H0 = all instruments valid
# Cannot test with exactly one instrument
fitstat(fit_iv, "sargan")
# 4. Endogeneity test (Wu-Hausman)
# H0 = treatment is exogenous (OLS consistent)
# Rejection justifies IV over OLS
fitstat(fit_iv, "wh")IV identifies only the effect on compliers — units whose treatment status is shifted by the instrument. Whether this is the parameter of interest depends on the question. A policy instrument that affects marginal adopters gives a LATE relevant to expansion policy. An instrument affecting high-intensity users gives a very different LATE.
Compliers
Take treatment when Z = 1, not when Z = 0. The instrument identifies effects only for this group. Their share = the first-stage coefficient.
Always-takers
Take treatment regardless of Z. Provide no identifying variation. Their outcomes enter both stages equally and cancel out.
Never-takers
Never take treatment regardless of Z. Same as always-takers — contribute nothing to identification. Only compliers matter.
# IV recovers LATE: the ATT for compliers only
# Compliers: units whose treatment status changes with the instrument
# Estimate complier share
# Pr(complier) = first-stage coefficient on Z
first_stage <- feols(treatment ~ instrument + covar_1, data = data)
complier_share <- coef(first_stage)["instrument"]
cat("Estimated complier share:", round(complier_share, 3), "\n")
# LATE is not ATT or ATE unless:
# (a) one-sided noncompliance (no defiers), OR
# (b) treatment effect homogeneity
# Always-takers and never-takers are unidentified
# Fuzzy RDD is a special case of IV
# Use rdrobust for local IV at a threshold
# library(rdrobust)
# rdrobust(y = outcome, x = running_var, fuzzy = treatment)My IV estimate is much larger than OLS — is that plausible?
Yes, and there are two common explanations. First, OLS may be downward biased if treatment is negatively selected — units who most need treatment receive it, attenuating the naive comparison. Second, IV estimates LATE for compliers, who may respond more to the treatment than the average unit. Always inspect the complier share and think about who they are.
First-stage F = 8 — can I proceed?
Proceed with caution. An F below 10 indicates a weak instrument. Report Anderson-Rubin confidence intervals rather than Wald CIs — AR inference remains valid under weak instruments. Consider LIML or Fuller estimators which have better finite-sample properties than 2SLS under weak instruments.
Sargan-Hansen rejects — what now?
At least one instrument violates the exclusion restriction. You cannot determine which one from the test alone. Investigate each instrument's theoretical justification, try dropping instruments one at a time and checking stability, or report results as sensitive to instrument choice.
Can I use IV in a panel setting with fixed effects?
Yes. Add unit and time fixed effects to both stages. In fixest, the formula | unit_id + year | treatment ~ instrument handles this correctly. Fixed effects absorb time-invariant confounders but do not resolve time-varying endogeneity — the instrument must still satisfy exclusion after demeaning.