Prediction approaches for partly missing multi-omics covariate data: A literature review and an empirical comparison study
This addresses a practical issue for researchers using multi-omics data in predictive modeling, but it is incremental as it reviews and compares existing methods rather than introducing new ones.
The paper tackles the problem of block-wise missing multi-omics data in outcome prediction by reviewing existing methods and empirically comparing their performance across 13 datasets, finding that some approaches like multiple imputation and matrix completion show competitive results with specific gains in accuracy.
As the availability of omics data has increased in the last few years, more multi-omics data have been generated, that is, high-dimensional molecular data consisting of several types such as genomic, transcriptomic, or proteomic data, all obtained from the same patients. Such data lend themselves to being used as covariates in automatic outcome prediction because each omics type may contribute unique information, possibly improving predictions compared to using only one omics data type. Frequently, however, in the training data and the data to which automatic prediction rules should be applied, the test data, the different omics data types are not available for all patients. We refer to this type of data as block-wise missing multi-omics data. First, we provide a literature review on existing prediction methods applicable to such data. Subsequently, using a collection of 13 publicly available multi-omics data sets, we compare the predictive performances of several of these approaches for different block-wise missingness patterns. Finally, we discuss the results of this empirical comparison study and draw some tentative conclusions.