GNAICVLGApr 28, 2022

Coupling Deep Imputation with Multitask Learning for Downstream Tasks on Genomics Data

arXiv:2204.13705v24 citationsh-index: 9
Originality Synthesis-oriented
AI Analysis

This addresses data completeness issues in genomics for clinical predictive tasks, offering a complementary approach to improve performance, though it is incremental as it builds on existing imputation and multitask methods.

The paper tackled missing data in genomics by combining deep imputation with multitask learning, finding that deep imputation alone outperformed multitask learning for most classification and regression tasks, while multitask learning was better for survival prediction with statistical significance (adjusted p-value 0.03).

Genomics data such as RNA gene expression, methylation and micro RNA expression are valuable sources of information for various clinical predictive tasks. For example, predicting survival outcomes, cancer histology type and other patients' related information is possible using not only clinical data but molecular data as well. Moreover, using these data sources together, for example in multitask learning, can boost the performance. However, in practice, there are many missing data points which leads to significantly lower patient numbers when analysing full cases, which in our setting refers to all modalities being present. In this paper we investigate how imputing data with missing values using deep learning coupled with multitask learning can help to reach state-of-the-art performance results using combined genomics modalities, RNA, micro RNA and methylation. We propose a generalised deep imputation method to impute values where a patient has all modalities present except one. Interestingly enough, deep imputation alone outperforms multitask learning alone for the classification and regression tasks across most combinations of modalities. In contrast, when using all modalities for survival prediction we observe that multitask learning alone outperforms deep imputation alone with statistical significance (adjusted p-value 0.03). Thus, both approaches are complementary when optimising performance for downstream predictive tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes