MELGAPAug 23, 2020

Stable discovery of interpretable subgroups via calibration in causal studies

arXiv:2008.10109v236 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of reliable subgroup discovery in randomized experiments for clinical researchers, offering a stable and interpretable method, though it builds incrementally on existing frameworks.

The paper tackles the problem of identifying patient subgroups with large heterogeneous treatment effects in causal studies by introducing StaDISC, a methodology that uses calibration to demonstrate poor global performance but local stability of CATE estimators, enabling discovery of interpretable subgroups; for example, it identified subgroups totaling 29.4% and 11.0% of patients in a VIGOR study for GI and CVT outcomes, with supporting evidence from an independent APPROVe study.

Building on Yu and Kumbier's PCS framework and for randomized experiments, we introduce a novel methodology for Stable Discovery of Interpretable Subgroups via Calibration (StaDISC), with large heterogeneous treatment effects. StaDISC was developed during our re-analysis of the 1999-2000 VIGOR study, an 8076 patient randomized controlled trial (RCT), that compared the risk of adverse events from a then newly approved drug, Rofecoxib (Vioxx), to that from an older drug Naproxen. Vioxx was found to, on average and in comparison to Naproxen, reduce the risk of gastrointestinal (GI) events but increase the risk of thrombotic cardiovascular (CVT) events. Applying StaDISC, we fit 18 popular conditional average treatment effect (CATE) estimators for both outcomes and use calibration to demonstrate their poor global performance. However, they are locally well-calibrated and stable, enabling the identification of patient groups with larger than (estimated) average treatment effects. In fact, StaDISC discovers three clinically interpretable subgroups each for the GI outcome (totaling 29.4% of the study size) and the CVT outcome (totaling 11.0%). Complementary analyses of the found subgroups using the 2001-2004 APPROVe study, a separate independently conducted RCT with 2587 patients, provides further supporting evidence for the promise of StaDISC.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes