Causal Methods
CHAPTER 11
TMLE

Targeted maximum likelihood

A semiparametric, doubly-robust estimator of the ATE that adds a targeting step to an initial ML outcome model, solving the efficient score equation directly. Widely used in epidemiology and biostatistics for its plug-in efficiency and natural compatibility with SuperLearner ensembles.

IDENTIFICATION SETUP
01Doubly robust via targeting

Like AIPW, TMLE is consistent if either the outcome model Q̂₀(A, W) or the propensity score ê(W) is correctly specified. The targeting step — not augmentation — is what achieves this: it updates Q̂₀ using the clever covariate so the efficient score evaluates to zero.

02Plug-in estimator

TMLE produces a plug-in ATE: the mean of the updated counterfactual predictions Q̂*(1, W) − Q̂*(0, W). Because it updates the outcome model directly rather than adding a correction term, it naturally respects model bounds — for binary Y the estimate stays in [0, 1].

03One-step efficiency

When both nuisance models converge at rate n^{−1/4}, TMLE achieves the semiparametric efficiency bound — the same as AIPW. The key difference: TMLE solves the efficient score equation by construction, making the plug-in step theoretically cleaner than the ad hoc augmentation in AIPW.

TMLE PIPELINE

WCovariatesQ̂₀E[Y | A, W]êP(A=1 | W)HTargeting stepQ̂*Updated outcomeATEPlug-in ATEsolves efficient score equationrespects model bounds
Q̂₀(A,W)Initial outcome model: E[Y | A, W] fit by any ML method
ê(W)Propensity score: P(A=1 | W), also fit by ML
H(A,W)Clever covariate: A/ê(W) − (1−A)/(1−ê(W))
Q̂*(A,W)Updated outcome model: Q̂₀ + ε·H after targeting
ATEPlug-in: mean(Q̂*(1,W) − Q̂*(0,W)) over all units
εTargeting coefficient: size of the update to Q̂₀
Targeting step and plug-in ATE
H(A,W) = A / ê(W) − (1−A) / (1−ê(W))
ε = argmin loss(Y, Q̂₀ + ε·H) [one-dimensional logistic / linear regression]
Q̂*(A,W) = Q̂₀(A,W) + ε · H(A,W)
ATE = (1/n) Σᵢ [ Q̂*(1, Wᵢ) − Q̂*(0, Wᵢ) ]
SE via EIF = Q̂*(1,W) − Q̂*(0,W) + H·(Y − Q̂*) − ATE
ASSUMPTIONS
Conditional ignorabilityrequired

Given W, treatment is independent of potential outcomes. TMLE inherits the same unconfoundedness requirement as IPW, matching, or AIPW. Double robustness relaxes the modeling requirement, not the identification assumption.

HOW TO TEST

Theoretical argument. There is no statistical test for this — it must be justified by study design and domain knowledge. Sensitivity analysis (e-values, Rosenbaum bounds) can quantify robustness to hidden confounding.

Overlap (positivity)required

Every unit must have a nonzero probability of treatment: 0 < P(A=1 | W) < 1. The clever covariate H = A/ê − (1−A)/(1−ê) explodes as ê approaches 0 or 1, inflating variance and destabilizing the targeting step.

HOW TO TEST

Inspect propensity score histogram by treatment arm. Flag units with PS < 0.05 or > 0.95. Clip propensity scores before computing H and report sensitivity to the clipping threshold.

Consistency and SUTVArequired

The observed outcome equals the potential outcome under the observed treatment: Y = Y(A). No interference between units. SUTVA violations — e.g. spillovers, general equilibrium effects — invalidate the potential outcomes framework.

HOW TO TEST

Assess whether spillovers are plausible given the intervention. Consider clustered randomization designs or partial-equilibrium IV arguments if interference is likely.

At least one nuisance model correctrequired

Double robustness guarantees consistency if either Q̂₀ or ê converges to the truth. Using flexible ML (SuperLearner) for both substantially reduces the risk of misspecification. When both are misspecified, TMLE is biased but may still outperform a naive plug-in.

HOW TO TEST

Cross-validated RMSE for Q̂₀ and AUC for ê. Compare ATE across different learner ensembles. A large targeting coefficient ε signals the initial Q̂₀ was far from optimal.

DATA REQUIREMENTS

Binary or continuous outcome

TMLE handles binary Y (use logistic loss for the targeting step) and continuous Y (linear targeting). Binary Y is particularly natural for TMLE because the plug-in update respects the [0, 1] bounds — a property AIPW's additive correction does not automatically guarantee.

Binary treatment

The standard TMLE targets a binary treatment. Extensions exist for multi-valued and continuous treatments via generalized propensity scores, implemented in the tmle3 / tlverse framework. The clever covariate generalizes to these settings but implementation is more complex.

Sample size and cross-fitting

Cross-fitting is optional in TMLE (unlike DML where it is required for bias elimination), but is recommended when using flexible ML nuisance models. A practical minimum is ~200 units. SuperLearner benefits from larger samples for ensemble weights to stabilize.

01_data_prep.R
library(tidyverse)

data <- read_csv("observational_data.csv")

# TMLE requires: outcome Y, binary treatment A, covariates W
# Binary or continuous outcome supported

Y <- data$outcome
A <- data$treatment
W <- data |>
  select(age, income, education, region, employment,
         starts_with("covar_")) |>
  as.data.frame()

# Overlap check — TMLE is sensitive to near-zero propensity scores
ps_model <- glm(A ~ ., data = cbind(W, A = A), family = binomial)
ps <- fitted(ps_model)

# Flag extreme scores
cat("PS < 0.05:", sum(ps < 0.05), "\n")
cat("PS > 0.95:", sum(ps > 0.95), "\n")

# Trim to common support if needed
keep <- ps > 0.05 & ps < 0.95
cat("Units retained after trimming:", sum(keep), "/", length(keep), "\n")
TARGETING STEP

The targeting step is what makes TMLE more than a plug-in estimator. It finds a one-dimensional perturbation ε of the initial outcome model Q̂₀ such that the efficient score equation is solved — guaranteeing that the resulting plug-in ATE is locally efficient and doubly robust.

The clever covariate H

H(A, W) = A/ê(W) − (1−A)/(1−ê(W)) encodes the propensity score into a covariate that, when regressed against the outcome residual, adjusts Q̂₀ in the direction of the efficient score. It is 'clever' because it targets the ATE — the specific estimand — not the whole outcome surface.

One-dimensional update

The targeting regression has a single coefficient ε. For continuous Y, this is a linear regression of (Y − Q̂₀) on H with no intercept. For binary Y, it is a logistic regression of Y on H with logit(Q̂₀) as a fixed offset. The resulting ε is small when Q̂₀ is already close to optimal.

Why not just use AIPW?

AIPW adds the IPW correction term to Q̂₀'s plug-in. TMLE instead updates Q̂₀ directly. Both achieve the same semiparametric efficiency bound, but TMLE's plug-in form always respects the outcome model's range. For binary Y, AIPW can produce estimates outside [0, 1]; TMLE cannot.

EIF and inference

The efficient influence function (EIF) for the ATE under TMLE is EIFᵢ = Q̂*(1,Wᵢ) − Q̂*(0,Wᵢ) + H(Aᵢ,Wᵢ)·(Yᵢ − Q̂*(Aᵢ,Wᵢ)) − ATE. The variance of the ATE estimator is Var(EIF)/n. The targeting step ensures mean(EIF) ≈ 0 in the sample.

INITIAL NUISANCE MODELS (Q̂₀ AND ê)

02_initial_fit.R
library(SuperLearner)

# TMLE step 1 & 2: fit initial outcome model Q̂₀ and propensity ê
# SuperLearner ensembles multiple learners via cross-validation

sl_libs <- c("SL.ranger", "SL.glmnet", "SL.mean")

# Fit propensity score ê(W) = P(A=1 | W)
g_fit <- SuperLearner(
  Y = A,
  X = W,
  family = binomial(),
  SL.library = sl_libs
)
g_hat <- g_fit$SL.predict   # P(A=1 | W)

# Fit initial outcome model Q̂₀(A, W) = E[Y | A, W]
# Stack treatment with covariates for the outcome model
AW <- cbind(A = A, W)
q_fit <- SuperLearner(
  Y = Y,
  X = AW,
  family = gaussian(),   # use binomial() if Y is binary
  SL.library = sl_libs
)

# Counterfactual predictions
AW1 <- cbind(A = 1, W)   # everyone treated
AW0 <- cbind(A = 0, W)   # everyone control

Q1_hat <- predict(q_fit, newdata = AW1)$pred
Q0_hat <- predict(q_fit, newdata = AW0)$pred
Q_hat  <- predict(q_fit, newdata = AW)$pred   # observed
ESTIMATION

Two implementation paths: the tmle R package wraps the full pipeline including SuperLearner and the targeting step. In Python, the manual implementation below walks through each step explicitly — useful for understanding the mechanics or customizing the estimator.

FULL TMLE PIPELINE

03_tmle.R
library(tmle)

# TMLE via the tmle package
# Handles the targeting step automatically using the clever covariate

tmle_fit <- tmle(
  Y = Y,
  A = A,
  W = W,
  Q.SL.library = c("SL.ranger", "SL.glmnet", "SL.mean"),  # outcome model
  g.SL.library = c("SL.ranger", "SL.glmnet", "SL.mean"),  # propensity model
  family = "gaussian"   # use "binomial" for binary outcome
)

# Results
summary(tmle_fit)

# ATE with 95% CI
tmle_fit$estimates$ATE$psi      # point estimate
tmle_fit$estimates$ATE$CI       # 95% confidence interval
tmle_fit$estimates$ATE$pvalue   # p-value

# EIF-based standard error
sqrt(tmle_fit$estimates$ATE$var.psi)
OUTPUT INTERPRETATION

My TMLE ATE and AIPW ATE are nearly identical — which should I report?

Both are valid doubly-robust, semiparametrically efficient estimators. Prefer TMLE when your outcome is binary and you want to guarantee the estimate stays in [0, 1], or when you are fitting in the tlverse / SuperLearner ecosystem. Prefer AIPW when you are already using econml or the DoubleML stack and want consistency with other estimators in your analysis. Agreement between the two is reassuring — divergence suggests numerical issues or extreme propensity scores.

The targeting coefficient ε is large — what does that mean?

A large ε indicates that the initial outcome model Q̂₀ was far from solving the efficient score equation. This often happens when the initial ML model overfits or when the propensity score is very predictive. The targeting step corrects for this, but a very large ε can inflate variance. Consider using more regularized initial models or cross-fitting, and always inspect the EIF mean after targeting.

How do I interpret the EIF-based standard error?

The EIF standard error is SE = sqrt(Var(EIF) / n), where EIF is the efficient influence function evaluated at the TMLE estimate. After the targeting step, mean(EIF) should be approximately zero in your sample — if it is not, the nuisance models are misfitting. The EIF-based SE is semiparametrically efficient: no estimator has a smaller asymptotic variance under conditional ignorability.

Should I use tmle, tmle3, or zepid?

Use the tmle R package (van der Laan, Gruber) for a well-tested, standard TMLE implementation that integrates directly with SuperLearner. Use tmle3 (part of the tlverse ecosystem) for more modern syntax and extensions to complex longitudinal and mediation estimands. In Python, zepid provides a relatively mature TMLE implementation; alternatively, the manual implementation above gives you full control and is easy to audit.

DIAGNOSTICS

04_diagnostics.R
library(tmle)
library(tidyverse)

# 1. Propensity score distribution
g_hat <- tmle_fit$g$g1W
hist(g_hat, breaks = 40,
     main = "Propensity score distribution",
     xlab = "P(A=1|W)")
abline(v = c(0.05, 0.95), lty = 2, col = "red")

# 2. EIF mean — should be close to zero
eif <- tmle_fit$estimates$ATE$IC
cat("EIF mean:", round(mean(eif), 6), "\n")
# If far from zero, nuisance models are misfitting

# 3. Targeting coefficient ε — should be small after good initial fit
# Retrieve from internal TMLE object
cat("Epsilon:", round(tmle_fit$epsilon, 6), "\n")
# Large |ε| suggests the initial Q̂₀ was far from optimal

# 4. Sensitivity to learner choice
for (lib in list(
    c("SL.glm"),
    c("SL.ranger"),
    c("SL.ranger", "SL.glmnet", "SL.mean")
)) {
  fit <- tmle(Y = Y, A = A, W = W,
              Q.SL.library = lib, g.SL.library = lib,
              family = "gaussian")
  cat("Library:", paste(lib, collapse = "+"),
      "| ATE:", round(fit$estimates$ATE$psi, 4), "\n")
}