QM LGAug 8, 2021

MuCoMiD: A Multitask Convolutional Learning Framework for miRNA-Disease Association Prediction

arXiv:2108.04820v34.314 citations

Originality Incremental advance

AI Analysis

This work addresses data scarcity and overfitting issues in computational biology for researchers predicting miRNA-disease associations, though it appears incremental as it builds on existing graph convolution methods with a novel multi-task perspective.

The paper tackles the problem of predicting miRNA-disease associations by proposing MuCoMiD, a multitask convolutional learning framework that automatically extracts features from heterogeneous biological sources, achieving improvements of at least 3% on standard benchmarks and 35% on larger independent test sets over state-of-the-art methods.

Growing evidence from recent studies implies that microRNA or miRNA could serve as biomarkers in various complex human diseases. Since wet-lab experiments are expensive and time-consuming, computational techniques for miRNA-disease association prediction have attracted a lot of attention in recent years. Data scarcity is one of the major challenges in building reliable machine learning models. Data scarcity combined with the use of precalculated hand-crafted input features has led to problems of overfitting and data leakage. We overcome the limitations of existing works by proposing a novel multi-tasking graph convolution-based approach, which we refer to as MuCoMiD. MuCoMiD allows automatic feature extraction while incorporating knowledge from five heterogeneous biological information sources (interactions between miRNA/diseases and protein-coding genes (PCG), interactions between protein-coding genes, miRNA family information, and disease ontology) in a multi-task setting which is a novel perspective and has not been studied before. To effectively test the generalization capability of our model, we construct large-scale experiments on standard benchmark datasets as well as our proposed larger independent test sets and case studies. MuCoMiD shows an improvement of at least 3% in 5-fold CV evaluation on HMDDv2.0 and HMDDv3.0 datasets and at least 35% on larger independent test sets with unseen miRNA and diseases over state-of-the-art approaches. We share our code for reproducibility and future research at https://git.l3s.uni-hannover.de/dong/cmtt.

View on arXiv PDF

Similar