Identifying Patient-Specific Root Causes of Disease
This addresses the challenge of personalized medicine by enabling more precise diagnosis and treatment for patients with complex diseases, though it is incremental as it builds on existing causal inference and Shapley value methods.
The paper tackles the problem of identifying patient-specific root causes of complex diseases by defining them as variables with exogenous shocks in a structural equation model and using Shapley values for quantification, resulting in a fast algorithm that improves accuracy by uncovering root causes with large individual-level effect sizes.
Complex diseases are caused by a multitude of factors that may differ between patients. As a result, hypothesis tests comparing all patients to all healthy controls can detect many significant variables with inconsequential effect sizes. A few highly predictive root causes may nevertheless generate disease within each patient. In this paper, we define patient-specific root causes as variables subject to exogenous "shocks" which go on to perturb an otherwise healthy system and induce disease. In other words, the variables are associated with the exogenous errors of a structural equation model (SEM), and these errors predict a downstream diagnostic label. We quantify predictivity using sample-specific Shapley values. This derivation allows us to develop a fast algorithm called Root Causal Inference for identifying patient-specific root causes by extracting the error terms of a linear SEM and then computing the Shapley value associated with each error. Experiments highlight considerable improvements in accuracy because the method uncovers root causes that may have large effect sizes at the individual level but clinically insignificant effect sizes at the group level. An R implementation is available at github.com/ericstrobl/RCI.