Victor Chernozhukov

h-index80

31papers

1,892citations

Novelty46%

AI Score33

Ranked #134,626 of 201,326 authors (top 67%)#1,855 in ML (top 53%)

31 Papers

LGJul 26, 2022

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett et al. · harvard

We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs. Future-dependent value functions play similar roles as classical value functions in fully-observable MDPs. We derive a new Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables. We further propose a minimax learning method to learn future-dependent value functions using the new Bellman equation. We obtain the PAC result, which implies our OPE estimator is consistent as long as futures and histories contain sufficient information about latent states, and the Bellman completeness. Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs.

GNApr 28, 2023

Hedonic Prices and Quality Adjusted Price Indices Powered by AI

Patrick Bajari, Zhihao Cen, Victor Chernozhukov et al.

We develop empirical models that efficiently process large amounts of unstructured product data (text, images, prices, quantities) to produce accurate hedonic price estimates and derived indices. To achieve this, we generate abstract product attributes (or ``features'') from descriptions and images using deep neural networks. These attributes are then used to estimate the hedonic price function. To demonstrate the effectiveness of this approach, we apply the models to Amazon's data for first-party apparel sales, and estimate hedonic prices. The resulting models have a very high out-of-sample predictive accuracy, with $R^2$ ranging from $80\%$ to $90\%$. Finally, we construct the AI-based hedonic Fisher price index, chained at the year-over-year frequency, and contrast it with the CPI and other electronic indices.

EMMar 25, 2022

Automatic Debiased Machine Learning for Dynamic Treatment Effects and General Nested Functionals

Victor Chernozhukov, Whitney Newey, Rahul Singh et al.

We extend the idea of automated debiased machine learning to the dynamic treatment regime and more generally to nested functionals. We show that the multiply robust formula for the dynamic treatment regime with discrete treatments can be re-stated in terms of a recursive Riesz representer characterization of nested mean regressions. We then apply a recursive Riesz representer estimation learning algorithm that estimates de-biasing corrections without the need to characterize how the correction terms look like, such as for instance, products of inverse probability weighting terms, as is done in prior work on doubly robust estimation in the dynamic regime. Our approach defines a sequence of loss minimization problems, whose minimizers are the mulitpliers of the de-biasing correction, hence circumventing the need for solving auxiliary propensity models and directly optimizing for the mean squared error of the target de-biasing correction. We provide further applications of our approach to estimation of dynamic discrete choice models and estimation of long-term effects with surrogates.

MLMay 10, 2021Code

Deeply-Debiased Off-Policy Interval Estimation

Chengchun Shi, Runzhe Wan, Victor Chernozhukov et al.

Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel deeply-debiasing procedure to construct an efficient, robust, and flexible CI on a target policy's value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/RunzheStat/D2OPE.

MLApr 7, 2021Code

DoubleML -- An Object-Oriented Implementation of Double Machine Learning in Python

Philipp Bach, Victor Chernozhukov, Malte S. Kurz et al.

DoubleML is an open-source Python library implementing the double machine learning framework of Chernozhukov et al. (2018) for a variety of causal models. It contains functionalities for valid statistical inference on causal parameters when the estimation of nuisance parameters is based on machine learning methods. The object-oriented implementation of DoubleML provides a high flexibility in terms of model specifications and makes it easily extendable. The package is distributed under the MIT license and relies on core libraries from the scientific Python ecosystem: scikit-learn, numpy, pandas, scipy, statsmodels and joblib. Source code, documentation and an extensive user guide can be found at https://github.com/DoubleML/doubleml-for-py and https://docs.doubleml.org.

MLMar 5, 2016Code

High-Dimensional Metrics in R

Victor Chernozhukov, Chris Hansen, Martin Spindler

The package High-dimensional Metrics (\Rpackage{hdm}) is an evolving collection of statistical methods for estimation and quantification of uncertainty in high-dimensional approximately sparse models. It focuses on providing confidence intervals and significance testing for (possibly many) low-dimensional subcomponents of the high-dimensional parameter vector. Efficient estimators and uniformly valid confidence intervals for regression coefficients on target variables (e.g., treatment or policy variable) in a high-dimensional approximately sparse regression model, for average treatment effect (ATE) and average treatment effect for the treated (ATET), as well for extensions of these parameters to the endogenous setting are provided. Theory grounded, data-driven methods for selecting the penalization parameter in Lasso regressions under heteroscedastic and non-Gaussian errors are implemented. Moreover, joint/ simultaneous confidence intervals for regression coefficients of a high-dimensional sparse regression are implemented, including a joint significance test for Lasso regression. Data sets which have been used in the literature and might be useful for classroom demonstration and for testing new estimators are included. \R and the package \Rpackage{hdm} are open-source software projects and can be freely downloaded from CRAN: \texttt{http://cran.r-project.org}.

EMMar 4, 2024

Applied Causal Inference Powered by ML and AI

Victor Chernozhukov, Christian Hansen, Nathan Kallus et al.

An introduction to the emerging fusion of machine learning and causal inference. The book presents ideas from classical structural equation models (SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and structural causal models (SCMs), and covers Double/Debiased Machine Learning methods to do inference in such models using modern predictive tools.

LGFeb 1, 2024

DoubleMLDeep: Estimation of Causal Effects with Multimodal Data

Sven Klaassen, Jan Teichert-Kluge, Philipp Bach et al.

This paper explores the use of unstructured, multimodal data, namely text and images, in causal inference and treatment effect estimation. We propose a neural network architecture that is adapted to the double machine learning (DML) framework, specifically the partially linear model. An additional contribution of our paper is a new method to generate a semi-synthetic dataset which can be used to evaluate the performance of causal effect estimation in the presence of text and images as confounders. The proposed methods and architectures are evaluated on the semi-synthetic dataset and compared to standard approaches, highlighting the potential benefit of using text and images directly in causal studies. Our findings have implications for researchers and practitioners in economics, marketing, finance, medicine and data science in general who are interested in estimating causal quantities using non-traditional data.

GNDec 31, 2024

Adventures in Demand Analysis Using AI

Philipp Bach, Victor Chernozhukov, Sven Klaassen et al.

This paper advances empirical demand analysis by integrating multimodal product representations derived from artificial intelligence (AI). Using a detailed dataset of toy cars on \textit{Amazon.com}, we combine text descriptions, images, and tabular covariates to represent each product using transformer-based embedding models. These embeddings capture nuanced attributes, such as quality, branding, and visual characteristics, that traditional methods often struggle to summarize. Moreover, we fine-tune these embeddings for causal inference tasks. We show that the resulting embeddings substantially improve the predictive accuracy of sales ranks and prices and that they lead to more credible causal estimates of price elasticity. Notably, we uncover strong heterogeneity in price elasticity driven by these product-specific features. Our findings illustrate that AI-driven representations can enrich and modernize empirical demand analysis. The insights generated may also prove valuable for applied causal inference more broadly.

MEDec 10, 2024

Automatic Doubly Robust Forests

Zhaomeng Chen, Junting Duan, Victor Chernozhukov et al.

This paper proposes the automatic Doubly Robust Random Forest (DRRF) algorithm for estimating the conditional expectation of a moment functional in the presence of high-dimensional nuisance functions. DRRF extends the automatic debiasing framework based on the Riesz representer to the conditional setting and enables nonparametric, forest-based estimation (Athey et al., 2019; Oprescu et al., 2019). In contrast to existing methods, DRRF does not require prior knowledge of the form of the debiasing term or impose restrictive parametric or semi-parametric assumptions on the target quantity. Additionally, it is computationally efficient in making predictions at multiple query points. We establish consistency and asymptotic normality results for the DRRF estimator under general assumptions, allowing for the construction of valid confidence intervals. Through extensive simulations in heterogeneous treatment effect (HTE) estimation, we demonstrate the superior performance of DRRF over benchmark approaches in terms of estimation accuracy, robustness, and computational efficiency.

EMDec 26, 2021

Long Story Short: Omitted Variable Bias in Causal Machine Learning

Victor Chernozhukov, Carlos Cinelli, Whitney Newey et al.

We develop a general theory of omitted variable bias for a wide range of common causal parameters, including (but not limited to) averages of potential outcomes, average treatment effects, average causal derivatives, and policy effects from covariate shifts. Our theory applies to nonparametric models, while naturally allowing for (semi-)parametric restrictions (such as partial linearity) when such assumptions are made. We show how simple plausibility judgments on the maximum explanatory power of omitted variables are sufficient to bound the magnitude of the bias, thus facilitating sensitivity analysis in otherwise complex, nonlinear models. Finally, we provide flexible and efficient statistical inference methods for the bounds, which can leverage modern machine learning algorithms for estimation. These results allow empirical researchers to perform sensitivity analyses in a flexible class of machine-learned causal models using very simple, and interpretable, tools. We demonstrate the utility of our approach with two empirical examples.

LGOct 6, 2021

RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests

Victor Chernozhukov, Whitney K. Newey, Victor Quintas-Martinez et al.

Many causal and policy effects of interest are defined by linear functionals of high-dimensional or non-parametric regression functions. $\sqrt{n}$-consistent and asymptotically normal estimation of the object of interest requires debiasing to reduce the effects of regularization and/or model selection on the object of interest. Debiasing is typically achieved by adding a correction term to the plug-in estimator of the functional, which leads to properties such as semi-parametric efficiency, double robustness, and Neyman orthogonality. We implement an automatic debiasing procedure based on automatically learning the Riesz representation of the linear functional using Neural Nets and Random Forests. Our method only relies on black-box evaluation oracle access to the linear functional and does not require knowledge of its analytic form. We propose a multitasking Neural Net debiasing method with stochastic gradient descent minimization of a combined Riesz representer and regression loss, while sharing representation layers for the two functions. We also propose a Random Forest method which learns a locally linear representation of the Riesz function. Even though our method applies to arbitrary functionals, we experimentally find that it performs well compared to the state of art neural net based algorithm of Shi et al. (2019) for the case of the average treatment effect functional. We also evaluate our method on the problem of estimating average marginal effects with continuous treatments, using semi-synthetic data of gasoline price changes on gasoline demand.

MEJun 17, 2021

Causal Bias Quantification for Continuous Treatments

Gianluca Detommaso, Michael Brückner, Philip Schulz et al.

We extend the definition of the marginal causal effect to the continuous treatment setting and develop a novel characterization of causal bias in the framework of structural causal models. We prove that our derived bias expression is zero if, and only if, the causal effect is identifiable via covariate adjustment. We show that under some restrictions on the structural equations, the causal bias can be estimated efficiently and allows for causal regularization of predictive probabilistic models. We demonstrate the effectiveness of our method for causal bias quantification in various settings where (not) controlling for certain covariates would introduce causal bias.

MLMay 31, 2021

A Simple and General Debiased Machine Learning Theorem with Finite Sample Guarantees

Victor Chernozhukov, Whitney K. Newey, Rahul Singh

Debiased machine learning is a meta algorithm based on bias correction and sample splitting to calculate confidence intervals for functionals, i.e. scalar summaries, of machine learning algorithms. For example, an analyst may desire the confidence interval for a treatment effect estimated with a neural network. We provide a nonasymptotic debiased machine learning theorem that encompasses any global or local functional of any machine learning algorithm that satisfies a few simple, interpretable conditions. Formally, we prove consistency, Gaussian approximation, and semiparametric efficiency by finite sample arguments. The rate of convergence is $n^{-1/2}$ for global functionals, and it degrades gracefully for local functionals. Our results culminate in a simple set of conditions that an analyst can use to translate modern learning theory rates into traditional statistical inference. The conditions reveal a general double robustness property for ill posed inverse problems.

MLMar 17, 2021

DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R

Philipp Bach, Victor Chernozhukov, Malte S. Kurz et al.

The R package DoubleML implements the double/debiased machine learning framework of Chernozhukov et al. (2018). It provides functionalities to estimate parameters in causal models based on machine learning methods. The double machine learning framework consist of three key ingredients: Neyman orthogonality, high-quality machine learning estimation and sample splitting. Estimation of nuisance components can be performed by various state-of-the-art machine learning methods that are available in the mlr3 ecosystem. DoubleML makes it possible to perform inference in a variety of causal models, including partially linear and interactive regression models and their extensions to instrumental variable estimation. The object-oriented implementation of DoubleML enables a high flexibility for the model specification and makes it easily extendable. This paper serves as an introduction to the double machine learning framework and the R package DoubleML. In reproducible code examples with simulated and real data sets, we demonstrate how DoubleML users can perform valid inference based on machine learning methods.

EMDec 30, 2020

Adversarial Estimation of Riesz Representers

Victor Chernozhukov, Whitney Newey, Rahul Singh et al.

Many causal parameters are linear functionals of an underlying regression. The Riesz representer is a key component in the asymptotic variance of a semiparametrically estimated linear functional. We propose an adversarial framework to estimate the Riesz representer using general function spaces. We prove a nonasymptotic mean square rate in terms of an abstract quantity called the critical radius, then specialize it for neural networks, random forests, and reproducing kernel Hilbert spaces as leading cases. Our estimators are highly compatible with targeted and debiased machine learning with sample splitting; our guarantees directly verify general conditions for inference that allow mis-specification. We also use our guarantees to prove inference without sample splitting, based on stability or complexity. Our estimators achieve nominal coverage in highly nonlinear simulations where some previous methods break down. They shed new light on the heterogeneous effects of matching grants.

STDec 27, 2019

Minimax Semiparametric Learning With Approximate Sparsity

Jelena Bradic, Victor Chernozhukov, Whitney K. Newey et al.

Estimating linear, mean-square continuous functionals is a pivotal challenge in statistics. In high-dimensional contexts, this estimation is often performed under the assumption of exact model sparsity, meaning that only a small number of parameters are precisely non-zero. This excludes models where linear formulations only approximate the underlying data distribution, such as nonparametric regression methods that use basis expansion such as splines, kernel methods or polynomial regressions. Many recent methods for root-$n$ estimation have been proposed, but the implications of exact model sparsity remain largely unexplored. In particular, minimax optimality for models that are not exactly sparse has not yet been developed. This paper formalizes the concept of approximate sparsity through classical semi-parametric theory. We derive minimax rates under this formulation for a regression slope and an average derivative, finding these bounds to be substantially larger than those in low-dimensional, semi-parametric settings. We identify several new phenomena. We discover new regimes where rate double robustness does not hold, yet root-$n$ estimation is still possible. In these settings, we propose an estimator that achieves minimax optimal rates. Our findings further reveal distinct optimality boundaries for ordered versus unordered nonparametric regression estimation.

MLAug 24, 2019

Welfare Analysis in Dynamic Models

Victor Chernozhukov, Whitney Newey, Vira Semenova

This paper introduces metrics for welfare analysis in dynamic models. We develop estimation and inference for these parameters even in the presence of a high-dimensional state space. Examples of welfare metrics include average welfare, average marginal welfare effects, and welfare decompositions into direct and indirect effects similar to Oaxaca (1973) and Blinder (1973). We derive dual and doubly robust representations of welfare metrics that facilitate debiased inference. For average welfare, the value function does not have to be estimated. In general, debiasing can be applied to any estimator of the value function, including neural nets, random forests, Lasso, boosting, and other high-dimensional methods. In particular, we derive Lasso and Neural Network estimators of the value function and associated dynamic dual representation and establish associated mean square convergence rates for these functions. Debiasing is automatic in the sense that it only requires knowledge of the welfare metric of interest, not the form of bias correction. The proposed methods are applied to estimate a dynamic behavioral model of teacher absenteeism in \cite{DHR} and associated average teacher welfare.

EMMay 24, 2019

Semi-Parametric Efficient Policy Learning with Continuous Actions

Mert Demirer, Vasilis Syrgkanis, Greg Lewis et al.

We consider off-policy evaluation and optimization with continuous action spaces. We focus on observational data where the data collection policy is unknown and needs to be estimated. We take a semi-parametric approach where the value function takes a known parametric form in the treatment, but we are agnostic on how it depends on the observed contexts. We propose a doubly robust off-policy estimate for this setting and show that off-policy optimization based on this estimate is robust to estimation errors of the policy function or the regression model. Our results also apply if the model does not satisfy our semi-parametric form, but rather we measure regret in terms of the best projection of the true value function to this functional space. Our work extends prior approaches of policy optimization from observational data that only considered discrete actions. We provide an experimental evaluation of our method in a synthetic data example motivated by optimal personalized pricing and costly resource allocation.

EMDec 11, 2018

Closing the U.S. gender wage gap requires understanding its heterogeneity

Philipp Bach, Victor Chernozhukov, Martin Spindler

In 2016, the majority of full-time employed women in the U.S. earned significantly less than comparable men. The extent to which women were affected by gender inequality in earnings, however, depended greatly on socio-economic characteristics, such as marital status or educational attainment. In this paper, we analyzed data from the 2016 American Community Survey using a high-dimensional wage regression and applying double lasso to quantify heterogeneity in the gender wage gap. We found that the gap varied substantially across women and was driven primarily by marital status, having children at home, race, occupation, industry, and educational attainment. We recommend that policy makers use these insights to design policies that will reduce discrimination and unequal pay more effectively.

EMSep 13, 2018

Valid Simultaneous Inference in High-Dimensional Settings (with the hdm package for R)

Philipp Bach, Victor Chernozhukov, Martin Spindler

Due to the increasing availability of high-dimensional empirical applications in many research disciplines, valid simultaneous inference becomes more and more important. For instance, high-dimensional settings might arise in economic studies due to very rich data sets with many potential covariates or in the analysis of treatment heterogeneities. Also the evaluation of potentially more complicated (non-linear) functional forms of the regression relationship leads to many potential variables for which simultaneous inferential statements might be of interest. Here we provide a review of classical and modern methods for simultaneous inference in (high-dimensional) settings and illustrate their use by a case study using the R package hdm. The R package hdm implements valid joint powerful and efficient hypothesis tests for a potentially large number of coeffcients as well as the construction of simultaneous confidence intervals and, therefore, provides useful methods to perform valid post-selection inference based on the LASSO.

MEAug 30, 2018

Uniform Inference in High-Dimensional Gaussian Graphical Models

Sven Klaassen, Jannis Kück, Martin Spindler et al.

Graphical models have become a very popular tool for representing dependencies within a large set of variables and are key for representing causal structures. We provide results for uniform inference on high-dimensional graphical models with the number of target parameters $d$ being possible much larger than sample size. This is in particular important when certain features or structures of a causal model should be recovered. Our results highlight how in high-dimensional settings graphical models can be estimated and recovered with modern machine learning methods in complex data sets. To construct simultaneous confidence regions on many target parameters, sufficiently fast estimation rates of the nuisance functions are crucial. In this context, we establish uniform estimation rates and sparsity guarantees of the square-root estimator in a random design under approximate sparsity conditions that might be of independent interest for related problems in high-dimensions. We also demonstrate in a comprehensive simulation study that our procedure has good small sample properties.

MLFeb 23, 2018

De-Biased Machine Learning of Global and Local Parameters Using Regularized Riesz Representers

Victor Chernozhukov, Whitney Newey, Rahul Singh

We provide adaptive inference methods, based on $\ell_1$ regularization, for regular (semi-parametric) and non-regular (nonparametric) linear functionals of the conditional expectation function. Examples of regular functionals include average treatment effects, policy effects, and derivatives. Examples of non-regular functionals include average treatment effects, policy effects, and derivatives conditional on a covariate subvector fixed at a point. We construct a Neyman orthogonal equation for the target parameter that is approximately invariant to small perturbations of the nuisance parameters. To achieve this property, we include the Riesz representer for the functional as an additional nuisance parameter. Our analysis yields weak ``double sparsity robustness'': either the approximation to the regression or the approximation to the representer can be ``completely dense'' as long as the other is sufficiently ``sparse''. Our main results are non-asymptotic and imply asymptotic uniform validity over large classes of models, translating into honest confidence bands for both global and local parameters.

MLFeb 17, 2018

Exact and Robust Conformal Inference Methods for Predictive Machine Learning With Dependent Data

Victor Chernozhukov, Kaspar Wuthrich, Yinchu Zhu

We extend conformal inference to general settings that allow for time series data. Our proposal is developed as a randomization method and accounts for potential serial dependence by including block structures in the permutation scheme. As a result, the proposed method retains the exact, model-free validity when the data are i.i.d. or more generally exchangeable, similar to usual conformal inference methods. When exchangeability fails, as is the case for common time series data, the proposed approach is approximately valid under weak assumptions on the conformity score.

MLDec 28, 2017

Estimation and Inference on Heterogeneous Treatment Effects in High-Dimensional Dynamic Panels under Weak Dependence

Vira Semenova, Matt Goldman, Victor Chernozhukov et al.

This paper provides estimation and inference methods for a conditional average treatment effects (CATE) characterized by a high-dimensional parameter in both homogeneous cross-sectional and unit-heterogeneous dynamic panel data settings. In our leading example, we model CATE by interacting the base treatment variable with explanatory variables. The first step of our procedure is orthogonalization, where we partial out the controls and unit effects from the outcome and the base treatment and take the cross-fitted residuals. This step uses a novel generic cross-fitting method we design for weakly dependent time series and panel data. This method "leaves out the neighbors" when fitting nuisance components, and we theoretically power it by using Strassen's coupling. As a result, we can rely on any modern machine learning method in the first step, provided it learns the residuals well enough. Second, we construct an orthogonal (or residual) learner of CATE -- the Lasso CATE -- that regresses the outcome residual on the vector of interactions of the residualized treatment with explanatory variables. If the complexity of CATE function is simpler than that of the first-stage regression, the orthogonal learner converges faster than the single-stage regression-based learner. Third, we perform simultaneous inference on parameters of the CATE function using debiasing. We also can use ordinary least squares in the last two steps when CATE is low-dimensional. In heterogeneous panel data settings, we model the unobserved unit heterogeneity as a weakly sparse deviation from Mundlak (1978)'s model of correlated unit effects as a linear function of time-invariant covariates and make use of L1-penalization to estimate these models. We demonstrate our methods by estimating price elasticities of groceries based on scanner data. We note that our results are new even for the cross-sectional (i.i.d) case.

MLDec 13, 2017

Fisher-Schultz Lecture: Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments, with an Application to Immunization in India

Victor Chernozhukov, Mert Demirer, Esther Duflo et al.

We propose strategies to estimate and make inference on key features of heterogeneous effects in randomized experiments. These key features include best linear predictors of the effects using machine learning proxies, average effects sorted by impact groups, and average characteristics of most and least impacted units. The approach is valid in high dimensional settings, where the effects are proxied (but not necessarily consistently estimated) by predictive and causal machine learning methods. We post-process these proxies into estimates of the key features. Our approach is generic, it can be used in conjunction with penalized methods, neural networks, random forests, boosted trees, and ensemble methods, both predictive and causal. Estimation and inference are based on repeated data splitting to avoid overfitting and achieve validity. We use quantile aggregation of the results across many potential splits, in particular taking medians of p-values and medians and other quantiles of confidence intervals. We show that quantile aggregation lowers estimation risks over a single split procedure, and establish its principal inferential properties. Finally, our analysis reveals ways to build provably better machine learning proxies through causal learning: we can use the objective functions that we develop to construct the best linear predictors of the effects, to obtain better machine learning proxies in the initial step. We illustrate the use of both inferential tools and causal learners with a randomized field experiment that evaluates a combination of nudges to stimulate demand for immunization in India.

MEFeb 21, 2017

Debiased Machine Learning of Conditional Average Treatment Effects and Other Causal Functions

Vira Semenova, Victor Chernozhukov

This paper provides estimation and inference methods for the best linear predictor (approximation) of a structural function, such as conditional average structural and treatment effects, and structural derivatives, based on modern machine learning (ML) tools. We represent this structural function as a conditional expectation of an unbiased signal that depends on a nuisance parameter, which we estimate by modern machine learning techniques. We first adjust the signal to make it insensitive (Neyman-orthogonal) with respect to the first-stage regularization bias. We then project the signal onto a set of basis functions, growing with sample size, which gives us the best linear predictor of the structural function. We derive a complete set of results for estimation and simultaneous inference on all parameters of the best linear predictor, conducting inference by Gaussian bootstrap. When the structural function is smooth and the basis is sufficiently rich, our estimation and inference result automatically targets this function. When basis functions are group indicators, the best linear predictor reduces to group average treatment/structural effect, and our inference automatically targets these parameters. We demonstrate our method by estimating uniform confidence bands for the average price elasticity of gasoline demand conditional on income.

MLJan 30, 2017

Double/Debiased/Neyman Machine Learning of Treatment Effects

Victor Chernozhukov, Denis Chetverikov, Mert Demirer et al.

Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, and Newey (2016) provide a generic double/de-biased machine learning (DML) approach for obtaining valid inferential statements about focal parameters, using Neyman-orthogonal scores and cross-fitting, in settings where nuisance parameters are estimated using a new generation of nonparametric fitting methods for high-dimensional data, called machine learning methods. In this note, we illustrate the application of this method in the context of estimating average treatment effects (ATE) and average treatment effects on the treated (ATTE) using observational data. A more general discussion and references to the existing literature are available in Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, and Newey (2016).

MEAug 1, 2016

hdm: High-Dimensional Metrics

Victor Chernozhukov, Chris Hansen, Martin Spindler

In this article the package High-dimensional Metrics (\texttt{hdm}) is introduced. It is a collection of statistical methods for estimation and quantification of uncertainty in high-dimensional approximately sparse models. It focuses on providing confidence intervals and significance testing for (possibly many) low-dimensional subcomponents of the high-dimensional parameter vector. Efficient estimators and uniformly valid confidence intervals for regression coefficients on target variables (e.g., treatment or policy variable) in a high-dimensional approximately sparse regression model, for average treatment effect (ATE) and average treatment effect for the treated (ATET), as well for extensions of these parameters to the endogenous setting are provided. Theory grounded, data-driven methods for selecting the penalization parameter in Lasso regressions under heteroscedastic and non-Gaussian errors are implemented. Moreover, joint/ simultaneous confidence intervals for regression coefficients of a high-dimensional sparse regression are implemented. Data sets which have been used in the literature and might be useful for classroom demonstration and for testing new estimators are included.

MLJul 30, 2016

Double/Debiased Machine Learning for Treatment and Causal Parameters

Victor Chernozhukov, Denis Chetverikov, Mert Demirer et al.

Most modern supervised statistical/machine learning (ML) methods are explicitly designed to solve prediction problems very well. Achieving this goal does not imply that these methods automatically deliver good estimators of causal parameters. Examples of such parameters include individual regression coefficients, average treatment effects, average lifts, and demand or supply elasticities. In fact, estimates of such causal parameters obtained via naively plugging ML estimators into estimating equations for such parameters can behave very poorly due to the regularization bias. Fortunately, this regularization bias can be removed by solving auxiliary prediction problems via ML tools. Specifically, we can form an orthogonal score for the target low-dimensional parameter by combining auxiliary and main ML predictions. The score is then used to build a de-biased estimator of the target parameter which typically will converge at the fastest possible 1/root(n) rate and be approximately unbiased and normal, and from which valid confidence intervals for these parameters of interest may be constructed. The resulting method thus could be called a "double ML" method because it relies on estimating primary and auxiliary predictive models. In order to avoid overfitting, our construction also makes use of the K-fold sample splitting, which we call cross-fitting. This allows us to use a very broad set of ML predictive methods in solving the auxiliary and main prediction problems, such as random forest, lasso, ridge, deep neural nets, boosted trees, as well as various hybrids and aggregators of these methods.

STNov 11, 2013

Program Evaluation and Causal Inference with High-Dimensional Data

Alexandre Belloni, Victor Chernozhukov, Ivan Fernández-Val et al.

In this paper, we provide efficient estimators and honest confidence bands for a variety of treatment effects including local average (LATE) and local quantile treatment effects (LQTE) in data-rich environments. We can handle very many control variables, endogenous receipt of treatment, heterogeneous treatment effects, and function-valued outcomes. Our framework covers the special case of exogenous receipt of treatment, either conditional on controls or unconditionally as in randomized control trials. In the latter case, our approach produces efficient estimators and honest bands for (functional) average treatment effects (ATE) and quantile treatment effects (QTE). To make informative inference possible, we assume that key reduced form predictive relationships are approximately sparse. This assumption allows the use of regularization and selection methods to estimate those relations, and we provide methods for post-regularization and post-selection inference that are uniformly valid (honest) across a wide-range of models. We show that a key ingredient enabling honest inference is the use of orthogonal or doubly robust moment conditions in estimating certain reduced form functional parameters. We illustrate the use of the proposed methods with an application to estimating the effect of 401(k) eligibility and participation on accumulated assets.