SECTION 02

Causal ML

Methods that combine flexible machine learning with causal identification — enabling valid effect estimation in high-dimensional settings and principled estimation of treatment effect heterogeneity.

WHY MACHINE LEARNING + CAUSALITY

Classical causal methods require the researcher to specify control variables parametrically. In high-dimensional settings this introduces omitted variable bias and model misspecification.

ML models can approximate complex nuisance functions — the conditional mean of Y or T given X — without specifying a parametric form. But naive ML estimates are regularization-biased for causal parameters.

Causal ML resolves this tension: use flexible ML for the parts you don't care about (nuisance), and apply orthogonalization and cross-fitting to recover valid, root-n consistent causal estimates.

DOUBLE ML — PARTIALLING OUT

ml_mPredicts treatment T from covariates X — E[T | X]

ml_lPredicts outcome Y from covariates X — E[Y | X]

T̃Treatment residual — T minus its predicted value

ỸOutcome residual — Y minus its predicted value

θCausal effect — OLS of Ỹ on T̃, partialled out

KEY CONCEPTS

Cross-fittingSplits data into folds; nuisance models trained on one fold predict on held-out folds. Removes overfitting bias from the causal estimate.

Neyman orthogonalityThe moment condition for the causal parameter is insensitive to small perturbations in the nuisance functions. Enables root-n consistent estimation.

CATEConditional average treatment effect — the expected treatment effect for a unit with covariates X. The target quantity in HTE methods.

Doubly robustAn estimator is doubly robust if it is consistent when either the outcome model or the propensity model is correctly specified — not necessarily both.

Honest estimationIn causal forests, trees are grown on one half of the data and effects estimated on the other, preventing overfitting of the effect surface.

EXAMPLE — PARTIALLY LINEAR REGRESSION

dml_estimate.R

library(DoubleML)

# Partially linear regression via DML
obj <- DoubleMLPLR$new(
  data = dml_data,
  ml_l = lrn("regr.ranger"),   # outcome model
  ml_m = lrn("regr.ranger"),   # treatment model
  n_folds = 5,
  score = "partialling out"
)

obj$fit()
obj$summary()

Full pipeline with nuisance tuning, cross-fitting, and inference → Chapter 07 — Double ML

METHODS IN THIS SECTION

07DML

Double machine learning

Uses cross-fitted ML models to partial out controls from both treatment and outcome, then estimates the causal effect on the residuals. Removes regularization bias without sacrificing flexibility.

∴ Conditional ignorability · Neyman orthogonalityCross-section or panelDoubleML (R/Python), econml

→

08CF

Causal forest

Extends random forests to estimate heterogeneous treatment effects (CATE) at the individual level. Uses honest splitting and local centering to debias estimates across the covariate space.

∴ Conditional ignorability · overlapCross-sectiongrf (R), econml

→

09DR

AIPW / DR learner

Augmented inverse probability weighting combines outcome and propensity models into a doubly robust score — consistent if either nuisance model is correctly specified. Efficient semiparametrically.

∴ Conditional ignorability · overlapCross-sectionDoubleML, econml, AIPW (R)

→

10HTE

Heterogeneous treatment effects

A family of methods — X-learner, R-learner, BART — for estimating how treatment effects vary across subgroups or individual covariates. Requires careful validation to avoid overfitting.

∴ Conditional ignorability · overlapCross-section or RCTgrf, econml, lasso2 (R)

→

11TMLE

Targeted maximum likelihood

A doubly-robust, one-step efficient estimator that updates an initial ML outcome model via a targeting step solving the efficient score equation. Respects model bounds and achieves semiparametric efficiency when nuisance models converge.

∴ Conditional ignorability · overlap · consistencyCross-sectiontmle / tmle3 (R), zepid (Python)

→