Causal Methods
CHAPTER 05
Match

Matching & IPW

Constructs a valid comparison group by balancing observed covariates — either by pairing treated and control units directly (matching) or by reweighting the sample (IPW). Assumes no unobserved confounding.

IDENTIFICATION SETUP
01The core idea

If we can observe all variables that jointly determine treatment selection and the outcome, we can construct a valid counterfactual by finding control units that look like treated units in every other respect.

02Matching vs IPW

Matching pairs each treated unit with one or more similar controls and discards the rest. IPW keeps all units but reweights them so that the covariate distribution of controls mimics the treated group. Both target the ATT under the same assumption.

03What it estimates

Both methods typically target the ATT — the average effect for treated units. Estimating the ATE requires stricter overlap: every unit must have a reasonable probability of being in either treatment arm.

COVARIATE BALANCE — LOVE PLOT (STYLIZED)

0.10.20.40.60.8thresholdstandardized mean difference (SMD)AgeIncomeEducationEmploymentRegionafter matchingbefore matching
SMDStandardized mean difference — the balance metric
< 0.1Conventional threshold for acceptable balance
Open dotPre-matching imbalance — often large
Filled dotPost-matching balance — target below threshold

Propensity score matching

Pairs each treated unit with nearest control(s) on PS

Discards unmatched controls — can waste data

Direct interpretability: matched pairs

Sensitive to caliper choice and replacement

Inverse probability weighting

Retains all units, reweights by ps/(1−ps) for controls

Efficient — uses the full sample

Sensitive to extreme propensity scores near 0 or 1

Stabilized weights reduce variance from extreme values

ASSUMPTIONS
Conditional ignorabilityrequired

Given the observed covariates X, treatment assignment is independent of potential outcomes. Equivalently: there are no unobserved variables that jointly determine treatment and the outcome. This is the core — and untestable — assumption.

HOW TO TEST

Theoretical argument and sensitivity analysis (e.g. Rosenbaum bounds). Inspect whether any plausible unobserved confounders are omitted from X.

Overlap (positivity)required

Every unit must have a nonzero probability of receiving either treatment. Units with propensity scores near 0 or 1 are effectively outside the region of common support — their counterfactuals cannot be estimated.

HOW TO TEST

Inspect the propensity score distribution for treated and control groups. Trim units with PS < 0.05 or > 0.95 and check sensitivity of results.

No interference (SUTVA)required

The potential outcome of unit i depends only on unit i's treatment. Spillovers — where one unit's treatment affects another's outcome — violate SUTVA and bias matching and IPW estimates.

HOW TO TEST

Theoretical argument. For geographic or network data, check whether control units are exposed to treated neighbors.

Correct propensity modelrecommended

The propensity score model should include all relevant confounders and capture their relationship with treatment accurately. A misspecified PS model will fail to balance covariates and bias the estimate.

HOW TO TEST

Check covariate balance after weighting/matching. Balance is the target — not the PS model fit. Try flexible models (CBPS, BART, random forests) if standard logistic regression fails to achieve balance.

DATA REQUIREMENTS

Rich covariate set

All variables that jointly predict treatment and the outcome must be observed. Missing confounders invalidate the ignorability assumption — more covariates is generally better, with regularization if needed.

Common support

The covariate distributions of treated and control groups must overlap. If treated units occupy regions of covariate space with no comparable controls, those units cannot be matched and their effects are unidentified.

Pre-treatment covariates only

All covariates used in the propensity model must be measured before treatment. Including post-treatment variables introduces collider bias and can block the causal path from treatment to outcome.

01_data_prep.R
library(tidyverse)

data <- read_csv("obs_data.csv")

# Inspect covariate distributions by treatment status
# Large differences signal imbalance that matching/weighting must address
data |>
  group_by(treated) |>
  summarise(across(c(age, income, education), list(mean = mean, sd = sd)))

# Standardized mean differences (SMD) — target < 0.1 after matching
smd <- function(x, t) {
  (mean(x[t==1]) - mean(x[t==0])) /
  sqrt((var(x[t==1]) + var(x[t==0])) / 2)
}

tibble(
  variable = c("age", "income", "education"),
  smd = c(smd(data$age, data$treated),
               smd(data$income, data$treated),
               smd(data$education, data$treated))
)
ESTIMATION

Both approaches use the propensity score — the predicted probability of treatment given covariates — as a summary of the high-dimensional covariate vector. Matching uses it to find similar controls; IPW uses it to construct weights that equalize the covariate distributions.

PROPENSITY SCORE MATCHING

02_matching.R
library(MatchIt)

# Nearest-neighbor propensity score matching (1:1, without replacement)
m_out <- matchit(
  treated ~ age + income + education + covar_4,
  data = data,
  method = "nearest",     # nearest-neighbor
  distance = "glm",       # propensity score via logistic regression
  ratio = 1,             # 1 control per treated unit
  replace = FALSE
)

summary(m_out)            # balance table pre/post matching

# Extract matched dataset
m_data <- match.data(m_out)

# Estimate ATT on matched sample
fit <- lm(outcome ~ treated + age + income + education, data = m_data,
          weights = weights)
coeftest(fit, vcov = vcovCL(fit, cluster = ~subclass))

INVERSE PROBABILITY WEIGHTING

03_ipw.R
library(WeightIt)

# Inverse probability weighting for ATT
w_out <- weightit(
  treated ~ age + income + education + covar_4,
  data = data,
  method = "ps",          # propensity score weighting
  estimand = "ATT"        # ATT: weight controls up, treated = 1
)

summary(w_out)            # effective sample size and balance

# Weighted outcome model
library(marginaleffects)
fit_ipw <- lm(outcome ~ treated, data = data, weights = w_out$weights)

# Cluster-robust SEs
library(sandwich)
coeftest(fit_ipw, vcov = vcovHC(fit_ipw, type = "HC3"))
DIAGNOSTICS

Covariate balance (SMD)

The primary diagnostic. After matching or weighting, compute the standardized mean difference for every covariate. All SMDs should fall below 0.1. A Love plot summarizes this visually across all covariates.

Propensity score overlap

Overlay histograms of the PS for treated and control groups. Substantial non-overlap signals a lack of common support — units in non-overlapping regions cannot be validly compared.

Effective sample size (IPW)

Extreme weights reduce the effective sample size. If ESS drops below ~30% of the original sample, stabilized or trimmed weights are needed. The WeightIt summary() reports ESS automatically.

Sensitivity analysis

Rosenbaum bounds quantify how strong an unobserved confounder would need to be to overturn the result. A Γ of 1.5 means an unobserved confounder would need to increase treatment odds by 50% to explain away the effect.

04_diagnostics.R
library(MatchIt)
library(WeightIt)
library(cobalt)

# 1. Covariate balance after matching/weighting
# Standardized mean differences should be < 0.1 for all covariates
bal.tab(m_out, thresholds = c(m = 0.1))  # matching
bal.tab(w_out, thresholds = c(m = 0.1))  # weighting

# 2. Love plot — visual balance summary
love.plot(m_out, threshold = 0.1, stars = "std")

# 3. Propensity score overlap
# Treated and control PS distributions should overlap substantially
data$ps <- fitted(glm(treated ~ age + income + education,
                       data = data, family = binomial))
ggplot(data, aes(x = ps, fill = factor(treated))) +
  geom_histogram(alpha = 0.5, bins = 40, position = "identity") +
  labs(x = "Propensity score", fill = "Treated")

# 4. Effective sample size after IPW
summary(w_out)  # ESS should be > 10% of original sample
OUTPUT INTERPRETATION

My SMDs are below 0.1 after matching — is the estimate valid?

Balance on observed covariates is necessary but not sufficient. The estimate is valid if the ignorability assumption holds — i.e. if there are no unobserved confounders. Good balance rules out observed confounding but cannot speak to unobserved variables. Always report a sensitivity analysis alongside the balance table.

I lost many observations after matching — is that a problem?

Discarding unmatched controls is by design — matching enforces common support by restricting the comparison to comparable units. The tradeoff is efficiency for bias reduction. If too many treated units are also discarded, the estimand has shifted to a subset of treated units with good matches. Report how many units were lost and characterize them.

My matching and IPW estimates differ substantially — which should I report?

Both, with explanation. Matching and IPW can give different estimates because they weight the covariate distribution differently in regions of overlap. If the PS model is well-specified, doubly robust estimators (AIPW) that combine both approaches are more efficient and provide a useful benchmark.

Some of my IPW weights are very large — what should I do?

Large weights indicate units with propensity scores near 0 or 1 — they are receiving enormous influence over the estimate. Use stabilized weights to reduce variance, trim extreme weights (e.g. cap at the 99th percentile), or trim units outside the region of common support before estimation. Report sensitivity to weight trimming.