ML AI LG CO MEJun 2, 2024

Bayesian Joint Additive Factor Models for Multiview Learning

Niccolo Anceschi, Federico Ferrari, David B. Dunson, Himel Mallick

arXiv:2406.00778v45.57 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the need for interpretable and accurate prediction tools in precision medicine, though it appears incremental as it builds on existing factor models.

The paper tackles the problem of predicting clinical outcomes from multiview data, such as multi-omics, by introducing Bayesian joint additive factor models to capture shared and view-specific variation, resulting in performance gains against state-of-the-art competitors in predicting time-to-labor onset.

It is increasingly common to collect data of multiple different types on the same set of samples. Our focus is on studying relationships between such multiview features and responses. A motivating application arises in the context of precision medicine where multi-omics data are collected to correlate with clinical outcomes. It is of interest to infer dependence within and across views while combining multimodal information to improve the prediction of outcomes. The signal-to-noise ratio can vary substantially across views, motivating more nuanced statistical tools beyond standard late and early fusion. This challenge comes with the need to preserve interpretability, select features, and obtain accurate uncertainty quantification. To address these challenges, we introduce two complementary factor regression models. A baseline Joint Factor Regression (\textsc{jfr}) captures combined variation across views via a single factor set, and a more nuanced Joint Additive FActor Regression (\textsc{jafar}) that decomposes variation into shared and view-specific components. For \textsc{jfr}, we use independent cumulative shrinkage process (\textsc{i-cusp}) priors, while for \textsc{jafar} we develop a dependent version (\textsc{d-cusp}) designed to ensure identifiability of the components. We develop Gibbs samplers that exploit the model structure and accommodate flexible feature and outcome distributions. Prediction of time-to-labor onset from immunome, metabolome, and proteome data illustrates performance gains against state-of-the-art competitors. Our open-source software (\texttt{R} package) is available at https://github.com/niccoloanceschi/jafar.

View on arXiv PDF Code

Similar