Probabilistic Joint and Individual Variation Explained (ProJIVE) for Data Integration

Raphiel J. Murden, Ganzhong Tian, Deqiang Qiu, Benajmin B. Risk

arXiv:2603.123517.7

AI Analysis

This work addresses data integration challenges in fields like genomics and neuroimaging, offering a probabilistic method that is incremental over existing JIVE approaches.

The authors tackled the problem of integrating multiple data types on the same subjects by developing ProJIVE, a probabilistic model that extends JIVE to estimate joint and individual variation, and applied it to Alzheimer's disease data, where it identified biologically meaningful patterns and showed strong correlations with expensive biomarkers.

Collecting multiple types of data on the same set of subjects is common in modern scientific applications including, genomics, metabolomics, and neuroimaging. Joint and Individual Variance Explained (JIVE) seeks a low-rank approximation of the joint variation between two or more sets of features captured on common subjects and isolates this variation from that unique to eachset of features. We develop an expectation-maximization (EM) algorithm to estimate a probabilistic model for the JIVE framework. The model extends probabilistic principal components analysis to multiple data sets. Our maximum likelihood approach simultaneously estimates joint and individual components, which can lead to greater accuracy compared to other methods. We apply ProJIVE to measures of brain morphometry and cognition in Alzheimer's disease. ProJIVE learns biologically meaningful courses of variation, and the joint morphometry and cognition subject scores are strongly related to more expensive existing biomarkers. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Code to reproduce the analysis is available on our GitHub page.

View on arXiv PDF

Similar