Matching & IPW
Constructs a valid comparison group by balancing observed covariates — either by pairing treated and control units directly (matching) or by reweighting the sample (IPW). Assumes no unobserved confounding.
If we can observe all variables that jointly determine treatment selection and the outcome, we can construct a valid counterfactual by finding control units that look like treated units in every other respect.
Matching pairs each treated unit with one or more similar controls and discards the rest. IPW keeps all units but reweights them so that the covariate distribution of controls mimics the treated group. Both target the ATT under the same assumption.
Both methods typically target the ATT — the average effect for treated units. Estimating the ATE requires stricter overlap: every unit must have a reasonable probability of being in either treatment arm.
COVARIATE BALANCE — LOVE PLOT (STYLIZED)
Propensity score matching
Pairs each treated unit with nearest control(s) on PS
Discards unmatched controls — can waste data
Direct interpretability: matched pairs
Sensitive to caliper choice and replacement
Inverse probability weighting
Retains all units, reweights by ps/(1−ps) for controls
Efficient — uses the full sample
Sensitive to extreme propensity scores near 0 or 1
Stabilized weights reduce variance from extreme values
Given the observed covariates X, treatment assignment is independent of potential outcomes. Equivalently: there are no unobserved variables that jointly determine treatment and the outcome. This is the core — and untestable — assumption.
HOW TO TEST
Theoretical argument and sensitivity analysis (e.g. Rosenbaum bounds). Inspect whether any plausible unobserved confounders are omitted from X.
Every unit must have a nonzero probability of receiving either treatment. Units with propensity scores near 0 or 1 are effectively outside the region of common support — their counterfactuals cannot be estimated.
HOW TO TEST
Inspect the propensity score distribution for treated and control groups. Trim units with PS < 0.05 or > 0.95 and check sensitivity of results.
The potential outcome of unit i depends only on unit i's treatment. Spillovers — where one unit's treatment affects another's outcome — violate SUTVA and bias matching and IPW estimates.
HOW TO TEST
Theoretical argument. For geographic or network data, check whether control units are exposed to treated neighbors.
The propensity score model should include all relevant confounders and capture their relationship with treatment accurately. A misspecified PS model will fail to balance covariates and bias the estimate.
HOW TO TEST
Check covariate balance after weighting/matching. Balance is the target — not the PS model fit. Try flexible models (CBPS, BART, random forests) if standard logistic regression fails to achieve balance.
Rich covariate set
All variables that jointly predict treatment and the outcome must be observed. Missing confounders invalidate the ignorability assumption — more covariates is generally better, with regularization if needed.
Common support
The covariate distributions of treated and control groups must overlap. If treated units occupy regions of covariate space with no comparable controls, those units cannot be matched and their effects are unidentified.
Pre-treatment covariates only
All covariates used in the propensity model must be measured before treatment. Including post-treatment variables introduces collider bias and can block the causal path from treatment to outcome.
library(tidyverse)
data <- read_csv("obs_data.csv")
# Inspect covariate distributions by treatment status
# Large differences signal imbalance that matching/weighting must address
data |>
group_by(treated) |>
summarise(across(c(age, income, education), list(mean = mean, sd = sd)))
# Standardized mean differences (SMD) — target < 0.1 after matching
smd <- function(x, t) {
(mean(x[t==1]) - mean(x[t==0])) /
sqrt((var(x[t==1]) + var(x[t==0])) / 2)
}
tibble(
variable = c("age", "income", "education"),
smd = c(smd(data$age, data$treated),
smd(data$income, data$treated),
smd(data$education, data$treated))
)Both approaches use the propensity score — the predicted probability of treatment given covariates — as a summary of the high-dimensional covariate vector. Matching uses it to find similar controls; IPW uses it to construct weights that equalize the covariate distributions.
PROPENSITY SCORE MATCHING
library(MatchIt)
# Nearest-neighbor propensity score matching (1:1, without replacement)
m_out <- matchit(
treated ~ age + income + education + covar_4,
data = data,
method = "nearest", # nearest-neighbor
distance = "glm", # propensity score via logistic regression
ratio = 1, # 1 control per treated unit
replace = FALSE
)
summary(m_out) # balance table pre/post matching
# Extract matched dataset
m_data <- match.data(m_out)
# Estimate ATT on matched sample
fit <- lm(outcome ~ treated + age + income + education, data = m_data,
weights = weights)
coeftest(fit, vcov = vcovCL(fit, cluster = ~subclass))INVERSE PROBABILITY WEIGHTING
library(WeightIt)
# Inverse probability weighting for ATT
w_out <- weightit(
treated ~ age + income + education + covar_4,
data = data,
method = "ps", # propensity score weighting
estimand = "ATT" # ATT: weight controls up, treated = 1
)
summary(w_out) # effective sample size and balance
# Weighted outcome model
library(marginaleffects)
fit_ipw <- lm(outcome ~ treated, data = data, weights = w_out$weights)
# Cluster-robust SEs
library(sandwich)
coeftest(fit_ipw, vcov = vcovHC(fit_ipw, type = "HC3"))Covariate balance (SMD)
The primary diagnostic. After matching or weighting, compute the standardized mean difference for every covariate. All SMDs should fall below 0.1. A Love plot summarizes this visually across all covariates.
Propensity score overlap
Overlay histograms of the PS for treated and control groups. Substantial non-overlap signals a lack of common support — units in non-overlapping regions cannot be validly compared.
Effective sample size (IPW)
Extreme weights reduce the effective sample size. If ESS drops below ~30% of the original sample, stabilized or trimmed weights are needed. The WeightIt summary() reports ESS automatically.
Sensitivity analysis
Rosenbaum bounds quantify how strong an unobserved confounder would need to be to overturn the result. A Γ of 1.5 means an unobserved confounder would need to increase treatment odds by 50% to explain away the effect.
library(MatchIt)
library(WeightIt)
library(cobalt)
# 1. Covariate balance after matching/weighting
# Standardized mean differences should be < 0.1 for all covariates
bal.tab(m_out, thresholds = c(m = 0.1)) # matching
bal.tab(w_out, thresholds = c(m = 0.1)) # weighting
# 2. Love plot — visual balance summary
love.plot(m_out, threshold = 0.1, stars = "std")
# 3. Propensity score overlap
# Treated and control PS distributions should overlap substantially
data$ps <- fitted(glm(treated ~ age + income + education,
data = data, family = binomial))
ggplot(data, aes(x = ps, fill = factor(treated))) +
geom_histogram(alpha = 0.5, bins = 40, position = "identity") +
labs(x = "Propensity score", fill = "Treated")
# 4. Effective sample size after IPW
summary(w_out) # ESS should be > 10% of original sampleMy SMDs are below 0.1 after matching — is the estimate valid?
Balance on observed covariates is necessary but not sufficient. The estimate is valid if the ignorability assumption holds — i.e. if there are no unobserved confounders. Good balance rules out observed confounding but cannot speak to unobserved variables. Always report a sensitivity analysis alongside the balance table.
I lost many observations after matching — is that a problem?
Discarding unmatched controls is by design — matching enforces common support by restricting the comparison to comparable units. The tradeoff is efficiency for bias reduction. If too many treated units are also discarded, the estimand has shifted to a subset of treated units with good matches. Report how many units were lost and characterize them.
My matching and IPW estimates differ substantially — which should I report?
Both, with explanation. Matching and IPW can give different estimates because they weight the covariate distribution differently in regions of overlap. If the PS model is well-specified, doubly robust estimators (AIPW) that combine both approaches are more efficient and provide a useful benchmark.
Some of my IPW weights are very large — what should I do?
Large weights indicate units with propensity scores near 0 or 1 — they are receiving enormous influence over the estimate. Use stabilized weights to reduce variance, trim extreme weights (e.g. cap at the 99th percentile), or trim units outside the region of common support before estimation. Report sensitivity to weight trimming.