MLMay 19, 2022
Deep Learning Methods for Proximal Inference via Maximum Moment RestrictionBenjamin Kompa, David R. Bellamy, Thomas Kolokotrones et al. · harvard
The No Unmeasured Confounding Assumption is widely used to identify causal effects in observational studies. Recent work on proximal inference has provided alternative identification results that succeed even in the presence of unobserved confounders, provided that one has measured a sufficiently rich set of proxy variables, satisfying specific structural conditions. However, proximal inference requires solving an ill-posed integral equation. Previous approaches have used a variety of machine learning techniques to estimate a solution to this integral equation, commonly referred to as the bridge function. However, prior work has often been limited by relying on pre-specified kernel functions, which are not data adaptive and struggle to scale to large datasets. In this work, we introduce a flexible and scalable method based on a deep neural network to estimate causal effects in the presence of unmeasured confounding using proximal inference. Our method achieves state of the art performance on two well-established proximal inference benchmarks. Finally, we provide theoretical consistency guarantees for our method.
MEOct 11, 2022
Causal and Counterfactual Views of Missing Data ModelsRazieh Nabi, Rohit Bhattacharya, Ilya Shpitser et al.
It is often said that the fundamental problem of causal inference is a missing data problem -- the comparison of responses to two hypothetical treatment assignments is made difficult because for every experimental unit only one potential response is observed. In this paper, we consider the implications of the converse view: that missing data problems are a form of causal inference. We make explicit how the missing data problem of recovering the complete data law from the observed law can be viewed as identification of a joint distribution over counterfactual variables corresponding to values had we (possibly contrary to fact) been able to observe them. Drawing analogies with causal inference, we show how identification assumptions in missing data can be encoded in terms of graphical models defined over counterfactual and observed variables. We review recent results in missing data identification from this viewpoint. In doing so, we note interesting similarities and differences between missing data and causal identification theories.
MEOct 16, 2025
Response to Discussions of "Causal and Counterfactual Views of Missing Data Models"Razieh Nabi, Rohit Bhattacharya, Ilya Shpitser et al.
We are grateful to the discussants, Levis and Kennedy [2025], Luo and Geng [2025], Wang and van der Laan [2025], and Yang and Kim [2025], for their thoughtful comments on our paper (Nabi et al., 2025). In this rejoinder, we summarize our main contributions and respond to each discussion in turn.
STJan 24, 2024
Assumptions and Bounds in the Instrumental Variable ModelThomas S. Richardson, James M. Robins
In this note we give proofs for results relating to the Instrumental Variable (IV) model with binary response $Y$ and binary treatment $X$, but with an instrument $Z$ with $K$ states. These results were originally stated in Richardson & Robins (2014), "ACE Bounds; SEMS with Equilibrium Conditions," arXiv:1410.0470.
MEAug 7, 2020
Rejoinder: On nearly assumption-free tests of nominal confidence interval coverage for causal parameters estimated by machine learningLin Liu, Rajarshi Mukherjee, James M. Robins
This is the rejoinder to the discussion by Kennedy, Balakrishnan and Wasserman on the paper "On nearly assumption-free tests of nominal confidence interval coverage for causal parameters estimated by machine learning" published in Statistical Science.
MLJun 29, 2019
Identification In Missing Data Models Represented By Directed Acyclic GraphsRohit Bhattacharya, Razieh Nabi, Ilya Shpitser et al.
Missing data is a pervasive problem in data analyses, resulting in datasets that contain censored realizations of a target distribution. Many approaches to inference on the target distribution using censored observed data, rely on missing data models represented as a factorization with respect to a directed acyclic graph. In this paper we consider the identifiability of the target distribution within this class of models, and show that the most general identification strategies proposed so far retain a significant gap in that they fail to identify a wide class of identifiable distributions. To address this gap, we propose a new algorithm that significantly generalizes the types of manipulations used in the ID algorithm, developed in the context of causal inference, in order to obtain identification.
MLApr 8, 2019
On nearly assumption-free tests of nominal confidence interval coverage for causal parameters estimated by machine learningLin Liu, Rajarshi Mukherjee, James M. Robins
For many causal effect parameters of interest, doubly robust machine learning (DRML) estimators $\hatψ_{1}$ are the state-of-the-art, incorporating the good prediction performance of machine learning; the decreased bias of doubly robust estimators; and the analytic tractability and bias reduction of sample splitting with cross fitting. Nonetheless, even in the absence of confounding by unmeasured factors, the nominal $(1 - α)$ Wald confidence interval $\hatψ_{1} \pm z_{α/ 2} \widehat{\mathsf{se}} [\hatψ_{1}]$ may still undercover even in large samples, because the bias of $\hatψ_{1}$ may be of the same or even larger order than its standard error of order $n^{-1/2}$. In this paper, we introduce essentially assumption-free tests that (i) can falsify the null hypothesis that the bias of $\hatψ_{1}$ is of smaller order than its standard error, (ii) can provide an upper confidence bound on the true coverage of the Wald interval, and (iii) are valid under the null under no smoothness/sparsity assumptions on the nuisance parameters. The tests, which we refer to as \underline{A}ssumption \underline{F}ree \underline{E}mpirical \underline{C}overage \underline{T}ests (AFECTs), are based on a U-statistic that estimates part of the bias of $\hatψ_{1}$.
STApr 7, 2019
A unifying approach for doubly-robust $\ell_1$ regularized estimation of causal contrastsEzequiel Smucler, Andrea Rotnitzky, James M. Robins
We consider inference about a scalar parameter under a non-parametric model based on a one-step estimator computed as a plug in estimator plus the empirical mean of an estimator of the parameter's influence function. We focus on a class of parameters that have influence function which depends on two infinite dimensional nuisance functions and such that the bias of the one-step estimator of the parameter of interest is the expectation of the product of the estimation errors of the two nuisance functions. Our class includes many important treatment effect contrasts of interest in causal inference and econometrics, such as ATE, ATT, an integrated causal contrast with a continuous treatment, and the mean of an outcome missing not at random. We propose estimators of the target parameter that entertain approximately sparse regression models for the nuisance functions allowing for the number of potential confounders to be even larger than the sample size. By employing sample splitting, cross-fitting and $\ell_1$-regularized regression estimators of the nuisance functions based on objective functions whose directional derivatives agree with those of the parameter's influence function, we obtain estimators of the target parameter with two desirable robustness properties: (1) they are rate doubly-robust in that they are root-n consistent and asymptotically normal when both nuisance functions follow approximately sparse models, even if one function has a very non-sparse regression coefficient, so long as the other has a sufficiently sparse regression coefficient, and (2) they are model doubly-robust in that they are root-n consistent and asymptotically normal even if one of the nuisance functions does not follow an approximately sparse model so long as the other nuisance function follows an approximately sparse model with a sufficiently sparse regression coefficient.
MLNov 17, 2014
Influence Functions for Machine Learning: Nonparametric Estimators for Entropies, Divergences and Mutual InformationsKirthevasan Kandasamy, Akshay Krishnamurthy, Barnabas Poczos et al.
We propose and analyze estimators for statistical functionals of one or more distributions under nonparametric assumptions. Our estimators are based on the theory of influence functions, which appear in the semiparametric statistics literature. We show that estimators based either on data-splitting or a leave-one-out technique enjoy fast rates of convergence and other favorable theoretical properties. We apply this framework to derive estimators for several popular information theoretic quantities, and via empirical evaluation, show the advantage of this approach over existing estimators.
LGSep 26, 2013
Sparse Nested Markov models with Log-linear ParametersIlya Shpitser, Robin J. Evans, Thomas S. Richardson et al.
Hidden variables are ubiquitous in practical data analysis, and therefore modeling marginal densities and doing inference with the resulting models is an important problem in statistics, machine learning, and causal inference. Recently, a new type of graphical model, called the nested Markov model, was developed which captures equality constraints found in marginals of directed acyclic graph (DAG) models. Some of these constraints, such as the so called `Verma constraint', strictly generalize conditional independence. To make modeling and inference with nested Markov models practical, it is necessary to limit the number of parameters in the model, while still correctly capturing the constraints in the marginal of a DAG model. Placing such limits is similar in spirit to sparsity methods for undirected graphical models, and regression models. In this paper, we give a log-linear parameterization which allows sparse modeling with nested Markov models. We illustrate the advantages of this parameterization with a simulation study.
AIFeb 20, 2013
Probabilistic Evaluation of Sequential Plans from Causal Models with Hidden VariablesJudea Pearl, James M. Robins
The paper concerns the probabilistic evaluation of plans in the presence of unmeasured variables, each plan consisting of several concurrent or sequential actions. We establish a graphical criterion for recognizing when the effects of a given plan can be predicted from passive observations on measured variables only. When the criterion is satisfied, a closed-form expression is provided for the probability that the plan will achieve a specified goal.
MLJul 20, 2012
Parameter and Structure Learning in Nested Markov ModelsIlya Shpitser, Thomas S. Richardson, James M. Robins et al.
The constraints arising from DAG models with latent variables can be naturally represented by means of acyclic directed mixed graphs (ADMGs). Such graphs contain directed and bidirected arrows, and contain no directed cycles. DAGs with latent variables imply independence constraints in the distribution resulting from a 'fixing' operation, in which a joint distribution is divided by a conditional. This operation generalizes marginalizing and conditioning. Some of these constraints correspond to identifiable 'dormant' independence constraints, with the well known 'Verma constraint' as one example. Recently, models defined by a set of the constraints arising after fixing from a DAG with latents, were characterized via a recursive factorization and a nested Markov property. In addition, a parameterization was given in the discrete case. In this paper we use this parameterization to describe a parameter fitting algorithm, and a search and score structure learning algorithm for these nested Markov models. We apply our algorithms to a variety of datasets.
MEMar 15, 2012
On the Validity of Covariate Adjustment for Estimating Causal EffectsIlya Shpitser, Tyler VanderWeele, James M. Robins
Identifying effects of actions (treatments) on outcome variables from observational data and causal assumptions is a fundamental problem in causal inference. This identification is made difficult by the presence of confounders which can be related to both treatment and outcome variables. Confounders are often handled, both in theory and in practice, by adjusting for covariates, in other words considering outcomes conditioned on treatment and covariate values, weighed by probability of observing those covariate values. In this paper, we give a complete graphical criterion for covariate adjustment, which we term the adjustment criterion, and derive some interesting corollaries of the completeness of this criterion.
LGFeb 14, 2012
An Efficient Algorithm for Computing Interventional Distributions in Latent Variable Causal ModelsIlya Shpitser, Thomas S. Richardson, James M. Robins
Probabilistic inference in graphical models is the task of computing marginal and conditional densities of interest from a factorized representation of a joint probability distribution. Inference algorithms such as variable elimination and belief propagation take advantage of constraints embedded in this factorization to compute such densities efficiently. In this paper, we propose an algorithm which computes interventional distributions in latent variable causal models represented by acyclic directed mixed graphs(ADMGs). To compute these distributions efficiently, we take advantage of a recursive factorization which generalizes the usual Markov factorization for DAGs and the more recent factorization for ADMGs. Our algorithm can be viewed as a generalization of variable elimination to the mixed graph case. We show our algorithm is exponential in the mixed graph generalization of treewidth.