LGJul 10, 2024
A Critical Review of Causal Reasoning Benchmarks for Large Language ModelsLinying Yang, Vik Shirvaikar, Oscar Clivio et al.
Numerous benchmarks aim to evaluate the capabilities of Large Language Models (LLMs) for causal inference and reasoning. However, many of them can likely be solved through the retrieval of domain knowledge, questioning whether they achieve their purpose. In this review, we present a comprehensive overview of LLM benchmarks for causality. We highlight how recent benchmarks move towards a more thorough definition of causal reasoning by incorporating interventional or counterfactual reasoning. We derive a set of criteria that a useful benchmark or set of benchmarks should aim to satisfy. We hope this work will pave the way towards a general framework for the assessment of causal understanding in LLMs and the design of novel benchmarks.
MESep 26, 2023
Targeting relative risk heterogeneity with causal forestsVik Shirvaikar, Andrea Storås, Xi Lin et al.
The identification of heterogeneous treatment effects (HTE) across subgroups is of significant interest in clinical trial analysis. Several state-of-the-art HTE estimation methods, including causal forests, apply recursive partitioning for non-parametric identification of relevant covariates and interactions. However, the partitioning criterion is typically based on differences in absolute risk. This can dilute statistical power by masking variation in the relative risk, which is often a more appropriate quantity of clinical interest. In this work, we propose and implement a methodology for modifying causal forests to target relative risk, using a novel node-splitting procedure based on exhaustive generalized linear model comparison. We present results from simulated data that suggest relative risk causal forests can capture otherwise undetected sources of heterogeneity. We implement our method on real-world trial data to explore HTEs for liraglutide in patients with type 2 diabetes.
LGNov 19, 2020
Rethinking recidivism through a causal lensVik Shirvaikar, Choudur Lakshminarayan
Predictive modeling of criminal recidivism, or whether people will re-offend in the future, has a long and contentious history. Modern causal inference methods allow us to move beyond prediction and target the "treatment effect" of a specific intervention on an outcome in an observational dataset. In this paper, we look specifically at the effect of incarceration (prison time) on recidivism, using a well-known dataset from North Carolina. Two popular causal methods for addressing confounding bias are explained and demonstrated: directed acyclic graph (DAG) adjustment and double machine learning (DML), including a sensitivity analysis for unobserved confounders. We find that incarceration has a detrimental effect on recidivism, i.e., longer prison sentences make it more likely that individuals will re-offend after release, although this conclusion should not be generalized beyond the scope of our data. We hope that this case study can inform future applications of causal inference to criminal justice analysis.