Multi-Omic Data Integration and Feature Selection for Survival-based Patient Stratification via Supervised Concrete Autoencoders
This work addresses cancer patient stratification for improved survival prediction, but it is incremental as it builds on existing multi-omic integration methods with modest gains.
The authors tackled the problem of predicting survival and stratifying cancer patients using multi-omic data, developing a Supervised Autoencoder (SAE) and Concrete Supervised Autoencoder (CSAE) that outperform or match common baselines, with SAE providing better survival separation and CSAE offering improved interpretability.
Cancer is a complex disease with significant social and economic impact. Advancements in high-throughput molecular assays and the reduced cost for performing high-quality multi-omics measurements have fuelled insights through machine learning . Previous studies have shown promise on using multiple omic layers to predict survival and stratify cancer patients. In this paper, we developed a Supervised Autoencoder (SAE) model for survival-based multi-omic integration which improves upon previous work, and report a Concrete Supervised Autoencoder model (CSAE), which uses feature selection to jointly reconstruct the input features as well as predict survival. Our experiments show that our models outperform or are on par with some of the most commonly used baselines, while either providing a better survival separation (SAE) or being more interpretable (CSAE). We also perform a feature selection stability analysis on our models and notice that there is a power-law relationship with features which are commonly associated with survival. The code for this project is available at: https://github.com/phcavelar/coxae