How important are the genes to explain the outcome - the asymmetric Shapley value as an honest importance metric for high-dimensional features
This work provides a more robust method for clinicians and researchers to understand the true importance of genomic features in predicting patient outcomes, especially in complex clinical settings where traditional methods fall short.
This paper addresses the problem of quantifying feature importance for high-dimensional genomic data in clinical prediction, where traditional methods struggle with collinearity and known directional dependencies. The authors propose using asymmetric Shapley values to honestly quantify feature importance, particularly when disease state mediates genomic effects and confounders are present.
In clinical prediction settings the importance of a high-dimensional feature like genomics is often assessed by evaluating the change in predictive performance when adding it to a set of traditional clinical variables. This approach is questionable, because it does not account for collinearity nor known directionality of dependencies between variables. We suggest to use asymmetric Shapley values as a more suitable alternative to quantify feature importance in the context of a mixed-dimensional prediction model. We focus on a setting that is particularly relevant in clinical prediction: disease state as a mediating variable for genomic effects, with additional confounders for which the direction of effects may be unknown. We derive efficient algorithms to compute local and global asymmetric Shapley values for this setting. The former are shown to be very useful for inference, whereas the latter provide interpretation by decomposing any predictive performance metric into contributions of the features. Throughout, we illustrate our framework by a leading example: the prediction of progression-free survival for colorectal cancer patients.