Multi-view Factorization AutoEncoder with Network Constraints for Multi-omic Integrative Analysis
This work addresses the 'big p, small N' problem in multi-omic data analysis for disease etiology, offering a domain-specific solution that is incremental in nature.
The authors tackled the challenge of integrating multi-omic data with domain knowledge to predict disease outcomes, achieving satisfactory results in predicting progression-free interval and overall survival on the TCGA Pan-cancer dataset.
Multi-omic data provides multiple views of the same patients. Integrative analysis of multi-omic data is crucial to elucidate the molecular underpinning of disease etiology. However, multi-omic data has the "big p, small N" problem (the number of features is large, but the number of samples is small), it is challenging to train a complicated machine learning model from the multi-omic data alone and make it generalize well. Here we propose a framework termed Multi-view Factorization AutoEncoder with network constraints to integrate multi-omic data with domain knowledge (biological interactions networks). Our framework employs deep representation learning to learn feature embeddings and patient embeddings simultaneously, enabling us to integrate feature interaction network and patient view similarity network constraints into the training objective. The whole framework is end-to-end differentiable. We applied our approach to the TCGA Pan-cancer dataset and achieved satisfactory results to predict disease progression-free interval (PFI) and patient overall survival (OS) events. Code will be made publicly available.