LG CE MLApr 26, 2013

Supervised Heterogeneous Multiview Learning for Joint Association Study and Disease Diagnosis

arXiv:1304.7284v2

Originality Incremental advance

AI Analysis

This work addresses the problem of integrating genetic and phenotypical data for improved disease diagnosis and biomarker discovery in biomedical research, representing an incremental advancement by unifying previously separate tasks.

The paper tackles the joint tasks of selecting genetic and phenotypical markers for disease diagnosis and identifying associations between these data types, which are typically treated separately, by proposing a sparse Bayesian approach that integrates them. The method achieves higher accuracy in predicting disease stages and discovering associations compared to alternative methods, as demonstrated on an Alzheimer's Disease dataset with significantly improved prediction accuracy.

Given genetic variations and various phenotypical traits, such as Magnetic Resonance Imaging (MRI) features, we consider two important and related tasks in biomedical research: i)to select genetic and phenotypical markers for disease diagnosis and ii) to identify associations between genetic and phenotypical data. These two tasks are tightly coupled because underlying associations between genetic variations and phenotypical features contain the biological basis for a disease. While a variety of sparse models have been applied for disease diagnosis and canonical correlation analysis and its extensions have bee widely used in association studies (e.g., eQTL analysis), these two tasks have been treated separately. To unify these two tasks, we present a new sparse Bayesian approach for joint association study and disease diagnosis. In this approach, common latent features are extracted from different data sources based on sparse projection matrices and used to predict multiple disease severity levels based on Gaussian process ordinal regression; in return, the disease status is used to guide the discovery of relationships between the data sources. The sparse projection matrices not only reveal interactions between data sources but also select groups of biomarkers related to the disease. To learn the model from data, we develop an efficient variational expectation maximization algorithm. Simulation results demonstrate that our approach achieves higher accuracy in both predicting ordinal labels and discovering associations between data sources than alternative methods. We apply our approach to an imaging genetics dataset for the study of Alzheimer's Disease (AD). Our method identifies biologically meaningful relationships between genetic variations, MRI features, and AD status, and achieves significantly higher accuracy for predicting ordinal AD stages than the competing methods.

View on arXiv PDF

Similar