MLMay 12, 2022Code
Addressing Census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplementsKosuke Imai, Santiago Olivella, Evan T. R. Rosenman
Prediction of individual's race and ethnicity plays an important role in social science and public health research. Examples include studies of racial disparity in health and voting. Recently, Bayesian Improved Surname Geocoding (BISG), which uses Bayes' rule to combine information from Census surname files with the geocoding of an individual's residence, has emerged as a leading methodology for this prediction task. Unfortunately, BISG suffers from two Census data problems that contribute to unsatisfactory predictive performance for minorities. First, the decennial Census often contains zero counts for minority racial groups in the Census blocks where some members of those groups reside. Second, because the Census surname files only include frequent names, many surnames -- especially those of minorities -- are missing from the list. To address the zero counts problem, we introduce a fully Bayesian Improved Surname Geocoding (fBISG) methodology that accounts for potential measurement error in Census counts by extending the naive Bayesian inference of the BISG methodology to full posterior inference. To address the missing surname problem, we supplement the Census surname data with additional data on last, first, and middle names taken from the voter files of six Southern states where self-reported race is available. Our empirical validation shows that the fBISG methodology and name supplements significantly improve the accuracy of race imputation across all racial groups, and especially for Asians. The proposed methodology, together with additional name data, is available via the open-source software WRU.
THMay 6
An Axiomatic Foundation for Decisions with Counterfactual UtilityBenedikt Koch, Kosuke Imai, Tomasz Strzalecki
Counterfactual utilities evaluate decisions not only by the realized outcome under a given decision, but also by the counterfactual outcomes that would arise under alternative decisions. By generalizing standard utility frameworks, they allow decision-makers to encode asymmetric criteria, such as avoiding harm and anticipating regret. Recent work, however, has raised fundamental concerns about the coherence and transitivity of counterfactual utilities. We address these concerns by extending the von Neumann-Morgenstern (vNM) framework to preferences defined on the extended space of all potential outcomes rather than realized outcomes alone. We show that expected counterfactual utility satisfies the vNM axioms on this extended domain, thereby admitting a coherent preference representation. We further examine how counterfactual preferences map onto the realized outcome space through menu-dependent and context-dependent projections. This axiomatic framework reconciles apparent inconsistencies highlighted by the Russian roulette example in the statistics literature and resolves the well-known Allais paradox from behavioral economics. We also derive an additional axiom required to reduce counterfactual utilities to standard utilities on the same potential outcome space, and establish an axiomatic foundation for additive counterfactual utilities, which satisfy a necessary and sufficient condition for point identification. Finally, we show that our results hold regardless of whether individual potential outcomes are deterministic or stochastic.
MEOct 15, 2022
Distributionally Robust Causal Inference with Observational DataDimitris Bertsimas, Kosuke Imai, Michael Lingzhi Li
We consider the estimation of average treatment effects in observational studies and propose a new framework of robust causal inference with unobserved confounders. Our approach is based on distributionally robust optimization and proceeds in two steps. We first specify the maximal degree to which the distribution of unobserved potential outcomes may deviate from that of observed outcomes. We then derive sharp bounds on the average treatment effects under this assumption. Our framework encompasses the popular marginal sensitivity model as a special case, and we demonstrate how the proposed methodology can address a primary challenge of the marginal sensitivity model that it produces uninformative results when unobserved confounders substantially affect treatment and outcome. Specifically, we develop an alternative sensitivity model, called the distributional sensitivity model, under the assumption that heterogeneity of treatment effect due to unobserved variables is relatively small. Unlike the marginal sensitivity model, the distributional sensitivity model allows for potential lack of overlap and often produces informative bounds even when unobserved variables substantially affect both treatment and outcome. Finally, we show how to extend the distributional sensitivity model to difference-in-differences designs and settings with instrumental variables. Through simulation and empirical studies, we demonstrate the applicability of the proposed methodology.
MLJun 21, 2022
Policy Learning with Asymmetric Counterfactual UtilitiesEli Ben-Michael, Kosuke Imai, Zhichao Jiang
Data-driven decision making plays an important role even in high stakes settings like medicine and public policy. Learning optimal policies from observed data requires a careful formulation of the utility function whose expected value is maximized across a population. Although researchers typically use utilities that depend on observed outcomes alone, in many settings the decision maker's utility function is more properly characterized by the joint set of potential outcomes under all actions. For example, the Hippocratic principle to "do no harm" implies that the cost of causing death to a patient who would otherwise survive without treatment is greater than the cost of forgoing life-saving treatment. We consider optimal policy learning with asymmetric counterfactual utility functions of this form that consider the joint set of potential outcomes. We show that asymmetric counterfactual utilities lead to an unidentifiable expected utility function, and so we first partially identify it. Drawing on statistical decision theory, we then derive minimax decision rules by minimizing the maximum expected utility loss relative to different alternative policies. We show that one can learn minimax loss decision rules from observed data by solving intermediate classification problems, and establish that the finite sample excess expected utility loss of this procedure is bounded by the regret of these intermediate classifiers. We apply this conceptual framework and methodology to the decision about whether or not to use right heart catheterization for patients with possible pulmonary hypertension.
OTAug 26, 2022
Race and ethnicity data for first, middle, and last namesEvan T. R. Rosenman, Santiago Olivella, Kosuke Imai
We provide the largest compiled publicly available dictionaries of first, middle, and last names for the purpose of imputing race and ethnicity using, for example, Bayesian Improved Surname Geocoding (BISG). The dictionaries are based on the voter files of six Southern states that collect self-reported racial data upon voter registration. Our data cover a much larger scope of names than any comparable dataset, containing roughly one million first names, 1.1 million middle names, and 1.4 million surnames. Individuals are categorized into five mutually exclusive racial and ethnic groups -- White, Black, Hispanic, Asian, and Other -- and racial/ethnic counts by name are provided for every name in each dictionary. Counts can then be normalized row-wise or column-wise to obtain conditional probabilities of race given name or name given race. These conditional probabilities can then be deployed for imputation in a data analytic task for which ground truth racial and ethnic data is not available.
LGJul 17, 2023
Bayesian Safe Policy Learning with Chance Constrained Optimization: Application to Military Security Assessment during the Vietnam WarZeyang Jia, Eli Ben-Michael, Kosuke Imai
Algorithmic decisions and recommendations are used in many high-stakes decision-making settings such as criminal justice, medicine, and public policy. We investigate whether it would have been possible to improve a security assessment algorithm employed during the Vietnam War, using outcomes measured immediately after its introduction in late 1969. This empirical application raises several methodological challenges that frequently arise in high-stakes algorithmic decision-making. First, before implementing a new algorithm, it is essential to characterize and control the risk of yielding worse outcomes than the existing algorithm. Second, the existing algorithm is deterministic, and learning a new algorithm requires transparent extrapolation. Third, the existing algorithm involves discrete decision tables that are difficult to optimize over. To address these challenges, we introduce the Average Conditional Risk (ACRisk), which first quantifies the risk that a new algorithmic policy leads to worse outcomes for subgroups of individual units and then averages this over the distribution of subgroups. We also propose a Bayesian policy learning framework that maximizes the posterior expected value while controlling the posterior expected ACRisk. This framework separates the estimation of heterogeneous treatment effects from policy optimization, enabling flexible estimation of effects and optimization over complex policy classes. We characterize the resulting chance-constrained optimization problem as a constrained linear programming problem. Our analysis shows that compared to the actual algorithm used during the Vietnam War, the learned algorithm assesses most regions as more secure and emphasizes economic and political factors over military factors.
APMar 23
Generalized Sequential Monte Carlo Sampling for Redistricting SimulationPhilip O'Sullivan, Kosuke Imai, Cory McCartan
Simulation methods have become important tools for quantifying partisan and racial bias in redistricting plans. We generalize the Sequential Monte Carlo (SMC) algorithm of McCartan and Imai (2023), one of the commonly used approaches. First, our generalized SMC (gSMC) algorithm can split off regions of arbitrary size, rather than a single district as in the original SMC framework, enabling the sampling of multi-member districts. Second, the gSMC algorithm can operate over various sampling spaces, providing additional computational flexibility. Third, we derive optimal-variance incremental weights and show how to compute them efficiently for each sampling space. Finally, we incorporate Markov chain Monte Carlo (MCMC) steps, creating a hybrid gSMC-MCMC algorithm that can be used for large-scale redistricting applications. We demonstrate the effectiveness of the proposed methodology through analyses of the Irish Parliament, which uses multi-member districts, and the Pennsylvania House of Representatives, which has more than 200 single-member districts.
LGJul 5, 2025Code
GenAI-Powered InferenceKosuke Imai, Kentaro Nakamura
We introduce GenAI-Powered Inference (GPI), a statistical framework for both causal and predictive inference using unstructured data, including text and images. GPI leverages open-source Generative Artificial Intelligence (GenAI) models -- such as large language models and diffusion models -- not only to generate unstructured data at scale but also to extract low-dimensional representations that are guaranteed to capture their underlying structure. Applying machine learning to these representations, GPI enables estimation of causal and predictive effects while quantifying associated estimation uncertainty. Unlike existing approaches to representation learning, GPI does not require fine-tuning of generative models, making it computationally efficient and broadly accessible. We illustrate the versatility of the GPI framework through three applications: (1) analyzing Chinese social media censorship, (2) estimating predictive effects of candidates' facial appearance on electoral outcomes, and (3) assessing the persuasiveness of political rhetoric. An open-source software package is available for implementing GPI.
MENov 4, 2023
Individualized Policy Evaluation and Learning under Clustered Network InterferenceYi Zhang, Kosuke Imai
Although there is now a large literature on policy evaluation and learning, much of the prior work assumes that the treatment assignment of one unit does not affect the outcome of another unit. Unfortunately, ignoring interference can lead to biased policy evaluation and ineffective learned policies. For example, treating influential individuals who have many friends can generate positive spillover effects, thereby improving the overall performance of an individualized treatment rule (ITR). We consider the problem of evaluating and learning an optimal ITR under clustered network interference (also known as partial interference), where clusters of units are sampled from a population and units may influence one another within each cluster. Unlike previous methods that impose strong restrictions on spillover effects, such as anonymous interference, the proposed methodology only assumes a semiparametric structural model, where each unit's outcome is an additive function of individual treatments within the cluster. Under this model, we propose an estimator that can be used to evaluate the empirical performance of an ITR. We show that this estimator is substantially more efficient than the standard inverse probability weighting estimator, which does not impose any assumption about spillover effects. We derive the finite-sample regret bound for a learned ITR, showing that the use of our efficient evaluation estimator leads to the improved performance of learned policies. We consider both experimental and observational studies, and for the latter, we develop a doubly robust estimator that is semiparametrically efficient and yields an optimal regret bound. Finally, we conduct simulation and empirical studies to illustrate the advantages of the proposed methodology.
MEJan 20, 2022Code
Using Machine Learning to Test Causal Hypotheses in Conjoint AnalysisDae Woong Ham, Kosuke Imai, Lucas Janson
Conjoint analysis is a popular experimental design used to measure multidimensional preferences. Researchers examine how varying a factor of interest, while controlling for other relevant factors, influences decision-making. Currently, there exist two methodological approaches to analyzing data from a conjoint experiment. The first focuses on estimating the average marginal effects of each factor while averaging over the other factors. Although this allows for straightforward design-based estimation, the results critically depend on the distribution of other factors and how interaction effects are aggregated. An alternative model-based approach can compute various quantities of interest, but requires researchers to correctly specify the model, a challenging task for conjoint analysis with many factors and possible interactions. In addition, a commonly used logistic regression has poor statistical properties even with a moderate number of factors when incorporating interactions. We propose a new hypothesis testing approach based on the conditional randomization test to answer the most fundamental question of conjoint analysis: Does a factor of interest matter in any way given the other factors? Our methodology is solely based on the randomization of factors, and hence is free from assumptions. Yet, it allows researchers to use any test statistic, including those based on complex machine learning algorithms. As a result, we are able to combine the strengths of the existing design-based and model-based approaches. We illustrate the proposed methodology through conjoint analysis of immigration preferences and political candidate evaluation. We also extend the proposed approach to test for regularity assumptions commonly used in conjoint analysis. An open-source software package is available for implementing the proposed methodology.
CLApr 13, 2020Code
Keyword Assisted Topic ModelsShusei Eshima, Kosuke Imai, Tomoya Sasaki
In recent years, fully automated content analysis based on probabilistic topic models has become popular among social scientists because of their scalability. The unsupervised nature of the models makes them suitable for exploring topics in a corpus without prior knowledge. However, researchers find that these models often fail to measure specific concepts of substantive interest by inadvertently creating multiple topics with similar content and combining distinct themes into a single topic. In this paper, we empirically demonstrate that providing a small number of keywords can substantially enhance the measurement performance of topic models. An important advantage of the proposed keyword assisted topic model (keyATM) is that the specification of keywords requires researchers to label topics prior to fitting a model to the data. This contrasts with a widespread practice of post-hoc topic interpretation and adjustments that compromises the objectivity of empirical findings. In our application, we find that keyATM provides more interpretable results, has better document classification performance, and is less sensitive to the number of topics than the standard topic models. Finally, we show that keyATM can also incorporate covariates and model time trends. An open-source software package is available for implementing the proposed methodology.
APMay 14, 2019Code
Experimental Evaluation of Individualized Treatment RulesKosuke Imai, Michael Lingzhi Li
The increasing availability of individual-level data has led to numerous applications of individualized (or personalized) treatment rules (ITRs). Policy makers often wish to empirically evaluate ITRs and compare their relative performance before implementing them in a target population. We propose a new evaluation metric, the population average prescriptive effect (PAPE). The PAPE compares the performance of ITR with that of non-individualized treatment rule, which randomly treats the same proportion of units. Averaging the PAPE over a range of budget constraints yields our second evaluation metric, the area under the prescriptive effect curve (AUPEC). The AUPEC represents an overall performance measure for evaluation, like the area under the receiver and operating characteristic curve (AUROC) does for classification, and is a generalization of the QINI coefficient utilized in uplift modeling. We use Neyman's repeated sampling framework to estimate the PAPE and AUPEC and derive their exact finite-sample variances based on random sampling of units and random assignment of treatment. We extend our methodology to a common setting, in which the same experimental data is used to both estimate and evaluate ITRs. In this case, our variance calculation incorporates the additional uncertainty due to random splits of data used for cross-validation. The proposed evaluation metrics can be estimated without requiring modeling assumptions, asymptotic approximation, or resampling methods. As a result, it is applicable to any ITR including those based on complex machine learning algorithms. The open-source software package is available for implementing the proposed methodology.
MEDec 20, 2018Code
Robust Estimation of Causal Effects via High-Dimensional Covariate Balancing Propensity ScoreYang Ning, Sida Peng, Kosuke Imai
In this paper, we propose a robust method to estimate the average treatment effects in observational studies when the number of potential confounders is possibly much greater than the sample size. We first use a class of penalized M-estimators for the propensity score and outcome models. We then calibrate the initial estimate of the propensity score by balancing a carefully selected subset of covariates that are predictive of the outcome. Finally, the estimated propensity score is used to construct the inverse probability weighting estimator. We prove that the proposed estimator, which has the sample boundedness property, is root-n consistent, asymptotically normal, and semiparametrically efficient when the propensity score model is correctly specified and the outcome model is linear in covariates. More importantly, we show that our estimator remains root-n consistent and asymptotically normal so long as either the propensity score model or the outcome model is correctly specified. We provide valid confidence intervals in both cases and further extend these results to the case where the outcome model is a generalized linear model. In simulation studies, we find that the proposed methodology often estimates the average treatment effect more accurately than the existing methods. We also present an empirical application, in which we estimate the average causal effect of college attendance on adulthood political participation. Open-source software is available for implementing the proposed methodology.
CLApr 24
Using Embedding Models to Improve Probabilistic Race PredictionNoan Dasanaike, Kosuke Imai
Estimating racial disparity requires individual-level race data, which are often unavailable due to the sensitivity of collecting such information. To address this problem, many researchers utilize Bayesian Improved Surname Geocoding (BISG), which have critically relied on Census surname data. Unfortunately, these data capture race-surname relationships only for common surnames, omitting approximately 10% of the US population. We show that predictive performance degrades substantially for individuals with such omitted, uncommon surnames because standard BISG implementation relies on a uninformative generic prior in these cases. To address this limitation, we propose embedding-powered BISG (eBISG), which uses pre-trained text embeddings to represent names as dense vectors and trains neural networks on 2020 Census surname and first-name data to estimate race probabilities for names not covered in the Census. We compare five approaches: standard BISG using only surnames, BIFSG incorporating first name probabilities, surname embedding for unlisted names, surname and first name embedding combining both, and a full-name embedding trained on voter file data from Southern states that captures interactions between name components. We show that each successive eBISG approach improves race prediction, with the full-name embedding yielding the largest gains, particularly for Hispanic and Asian voters whose surnames are absent from the Census list.
AIMar 18, 2024
Does AI help humans make better decisions? A statistical evaluation framework for experimental and observational studiesEli Ben-Michael, D. James Greiner, Melody Huang et al.
The use of Artificial Intelligence (AI), or more generally data-driven algorithms, has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions compared to a human-alone or AI-alone system. We introduce a new methodological framework to empirically answer this question with a minimal set of assumptions. We measure a decision maker's ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded and unconfounded treatment assignment, where the provision of AI-generated recommendations is assumed to be randomized across cases with humans making final decisions. Under this study design, we show how to compare the performance of three alternative decision-making systems--human-alone, human-with-AI, and AI-alone. Importantly, the AI-alone system includes any individualized treatment assignment, including those that are not used in the original study. We also show when AI recommendations should be provided to a human-decision maker, and when one should follow such recommendations. We apply the proposed methodology to our own randomized controlled trial evaluating a pretrial risk assessment instrument. We find that the risk assessment recommendations do not improve the classification accuracy of a judge's decision to impose cash bail. Furthermore, we find that replacing a human judge with algorithms--the risk assessment score and a large language model in particular--leads to a worse classification performance.
LGJul 7, 2025
Bridging Prediction and Intervention Problems in Social SystemsLydia T. Liu, Inioluwa Deborah Raji, Angela Zhou et al.
Many automated decision systems (ADS) are designed to solve prediction problems -- where the goal is to learn patterns from a sample of the population and apply them to individuals from the same population. In reality, these prediction systems operationalize holistic policy interventions in deployment. Once deployed, ADS can shape impacted population outcomes through an effective policy change in how decision-makers operate, while also being defined by past and present interactions between stakeholders and the limitations of existing organizational, as well as societal, infrastructure and context. In this work, we consider the ways in which we must shift from a prediction-focused paradigm to an interventionist paradigm when considering the impact of ADS within social systems. We argue this requires a new default problem setup for ADS beyond prediction, to instead consider predictions as decision support, final decisions, and outcomes. We highlight how this perspective unifies modern statistical frameworks and other tools to study the design, implementation, and evaluation of ADS systems, and point to the research directions necessary to operationalize this paradigm shift. Using these tools, we characterize the limitations of focusing on isolated prediction tasks, and lay the foundation for a more intervention-oriented approach to developing and deploying ADS.
STMay 13, 2025
Statistical Decision Theory with Counterfactual LossBenedikt Koch, Kosuke Imai
Many researchers have applied classical statistical decision theory to evaluate treatment choices and learn optimal policies. However, because this framework is based solely on realized outcomes under chosen decisions and ignores counterfactual outcomes, it cannot assess the quality of a decision relative to feasible alternatives. For example, in bail decisions, a judge must consider not only crime prevention but also the avoidance of unnecessary burdens on arrestees. To address this limitation, we generalize standard decision theory by incorporating counterfactual losses, allowing decisions to be evaluated using all potential outcomes. The central challenge in this counterfactual statistical decision framework is identification: since only one potential outcome is observed for each unit, the associated counterfactual risk is generally not identifiable. We prove that, under the assumption of strong ignorability, the counterfactual risk is identifiable if and only if the counterfactual loss function is additive in the potential outcomes. Moreover, we demonstrate that additive counterfactual losses can yield treatment recommendations, which differ from those based on standard loss functions when the decision problem involves more than two treatment options. One interpretation of this result is that additive counterfactual losses can capture the accuracy and difficulty of a decision, whereas standard losses account for accuracy alone. Finally, we formulate a symbolic linear inverse program that, given a counterfactual loss, determines whether its risk is identifiable, without requiring data.
MMMar 28, 2025
Using AI to Summarize US Presidential Campaign TV Advertisement Videos, 1952-2012Adam Breuer, Bryce J. Dietrich, Michael H. Crespin et al.
This paper introduces the largest and most comprehensive dataset of US presidential campaign television advertisements, available in digital format. The dataset also includes machine-searchable transcripts and high-quality summaries designed to facilitate a variety of academic research. To date, there has been great interest in collecting and analyzing US presidential campaign advertisements, but the need for manual procurement and annotation led many to rely on smaller subsets. We design a large-scale parallelized, AI-based analysis pipeline that automates the laborious process of preparing, transcribing, and summarizing videos. We then apply this methodology to the 9,707 presidential ads from the Julian P. Kanter Political Commercial Archive. We conduct extensive human evaluations to show that these transcripts and summaries match the quality of manually generated alternatives. We illustrate the value of this data by including an application that tracks the genesis and evolution of current focal issue areas over seven decades of presidential elections. Our analysis pipeline and codebase also show how to use LLM-based tools to obtain high-quality summaries for other video datasets.
MEApr 25, 2024
Neyman Meets Causal Machine Learning: Experimental Evaluation of Individualized Treatment RulesMichael Lingzhi Li, Kosuke Imai
A century ago, Neyman showed how to evaluate the efficacy of treatment using a randomized experiment under a minimal set of assumptions. This classical repeated sampling framework serves as a basis of routine experimental analyses conducted by today's scientists across disciplines. In this paper, we demonstrate that Neyman's methodology can also be used to experimentally evaluate the efficacy of individualized treatment rules (ITRs), which are derived by modern causal machine learning algorithms. In particular, we show how to account for additional uncertainty resulting from a training process based on cross-fitting. The primary advantage of Neyman's approach is that it can be applied to any ITR regardless of the properties of machine learning algorithms that are used to derive the ITR. We also show, somewhat surprisingly, that for certain metrics, it is more efficient to conduct this ex-post experimental evaluation of an ITR than to conduct an ex-ante experimental evaluation that randomly assigns some units to the ITR. Our analysis demonstrates that Neyman's repeated sampling framework is as relevant for causal inference today as it has been since its inception.
LGMar 11, 2024
Cramming Contextual Bandits for On-policy Statistical EvaluationZeyang Jia, Kosuke Imai, Michael Lingzhi Li
We introduce the cram method as a general statistical framework for evaluating the final learned policy from a multi-armed contextual bandit algorithm, using the dataset generated by the same bandit algorithm. The proposed on-policy evaluation methodology differs from most existing methods that focus on off-policy performance evaluation of contextual bandit algorithms. Cramming utilizes an entire bandit sequence through a single pass of data, leading to both statistically and computationally efficient evaluation. We prove that if a bandit algorithm satisfies a certain stability condition, the resulting crammed evaluation estimator is consistent and asymptotically normal under mild regularity conditions. Furthermore, we show that this stability condition holds for commonly used linear contextual bandit algorithms, including epsilon-greedy, Thompson Sampling, and Upper Confidence Bound algorithms. Using both synthetic and publicly available datasets, we compare the empirical performance of cramming with the state-of-the-art methods. The results demonstrate that the proposed cram method reduces the evaluation standard error by approximately 40% relative to off-policy evaluation methods while preserving unbiasedness and valid confidence interval coverage.
MLSep 22, 2021
Safe Policy Learning through Extrapolation: Application to Pre-trial Risk AssessmentEli Ben-Michael, D. James Greiner, Kosuke Imai et al.
Algorithmic recommendations and decisions have become ubiquitous in today's society. Many of these data-driven policies, especially in the realm of public policy, are based on known, deterministic rules to ensure their transparency and interpretability. We examine a particular case of algorithmic pre-trial risk assessments in the US criminal justice system, which provide deterministic classification scores and recommendations to help judges make release decisions. Our goal is to analyze data from a unique field experiment on an algorithmic pre-trial risk assessment to investigate whether the scores and recommendations can be improved. Unfortunately, prior methods for policy learning are not applicable because they require existing policies to be stochastic. We develop a maximin robust optimization approach that partially identifies the expected utility of a policy, and then finds a policy that maximizes the worst-case expected utility. The resulting policy has a statistical safety property, limiting the probability of producing a worse policy than the existing one, under structural assumptions about the outcomes. Our analysis of data from the field experiment shows that we can safely improve certain components of the risk assessment instrument by classifying arrestees as lower risk under a wide range of utility specifications, though the analysis is not informative about several components of the instrument.
MEFeb 23, 2021
Estimating Average Treatment Effects with Support Vector MachinesAlexander Tarr, Kosuke Imai
Support vector machine (SVM) is one of the most popular classification algorithms in the machine learning literature. We demonstrate that SVM can be used to balance covariates and estimate average causal effects under the unconfoundedness assumption. Specifically, we adapt the SVM classifier as a kernel-based weighting procedure that minimizes the maximum mean discrepancy between the treatment and control groups while simultaneously maximizing effective sample size. We also show that SVM is a continuous relaxation of the quadratic integer program for computing the largest balanced subset, establishing its direct relation to the cardinality matching method. Another important feature of SVM is that the regularization parameter controls the trade-off between covariate balance and effective sample size. As a result, the existing SVM path algorithm can be used to compute the balance-sample size frontier. We characterize the bias of causal effect estimation arising from this trade-off, connecting the proposed SVM procedure to the existing kernel balancing methods. Finally, we conduct simulation and empirical studies to evaluate the performance of the proposed methodology and find that SVM is competitive with the state-of-the-art covariate balancing methods.
CYMay 21, 2020
Principal Fairness for Human and Algorithmic Decision-MakingKosuke Imai, Zhichao Jiang
Using the concept of principal stratification from the causal inference literature, we introduce a new notion of fairness, called principal fairness, for human and algorithmic decision-making. The key idea is that one should not discriminate among individuals who would be similarly affected by the decision. Unlike the existing statistical definitions of fairness, principal fairness explicitly accounts for the fact that individuals can be impacted by the decision. Furthermore, we explain how principal fairness differs from the existing causality-based fairness criteria. In contrast to the counterfactual fairness criteria, for example, principal fairness considers the effects of decision in question rather than those of protected attributes of interest. We briefly discuss how to approach empirical evaluation and policy learning problems under the proposed principal fairness criterion.
CLDec 2, 2019
Large-scale text processing pipeline with Apache SparkAlexey Svyatkovskiy, Kosuke Imai, Mary Kroeger et al.
In this paper, we evaluate Apache Spark for a data-intensive machine learning problem. Our use case focuses on policy diffusion detection across the state legislatures in the United States over time. Previous work on policy diffusion has been unable to make an all-pairs comparison between bills due to computational intensity. As a substitute, scholars have studied single topic areas. We provide an implementation of this analysis workflow as a distributed text processing pipeline with Spark dataframes and Scala application programming interface. We discuss the challenges and strategies of unstructured data processing, data formats for storage and efficient access, and graph processing at scale.
MEOct 15, 2019
Discussion of "The Blessings of Multiple Causes" by Wang and BleiKosuke Imai, Zhichao Jiang
This commentary has two goals. We first critically review the deconfounder method and point out its advantages and limitations. We then briefly consider three possible ways to address some of the limitations of the deconfounder method.