GNAILGMLJun 14, 2020

Multiclass Disease Predictions Based on Integrated Clinical and Genomics Datasets

arXiv:2006.07879v1
Originality Synthesis-oriented
AI Analysis

This work addresses the need for more comprehensive data integration in precision medicine, though it is incremental as it applies existing machine learning methods to a combined dataset.

The paper tackled the problem of predicting multiple diseases by integrating clinical and genomics datasets, achieving 73% accuracy in multiclass classification across 75 disease classes using an instance-based learner with PCA for feature selection.

Clinical predictions using clinical data by computational methods are common in bioinformatics. However, clinical predictions using information from genomics datasets as well is not a frequently observed phenomenon in research. Precision medicine research requires information from all available datasets to provide intelligent clinical solutions. In this paper, we have attempted to create a prediction model which uses information from both clinical and genomics datasets. We have demonstrated multiclass disease predictions based on combined clinical and genomics datasets using machine learning methods. We have created an integrated dataset, using a clinical (ClinVar) and a genomics (gene expression) dataset, and trained it using instance-based learner to predict clinical diseases. We have used an innovative but simple way for multiclass classification, where the number of output classes is as high as 75. We have used Principal Component Analysis for feature selection. The classifier predicted diseases with 73\% accuracy on the integrated dataset. The results were consistent and competent when compared with other classification models. The results show that genomics information can be reliably included in datasets for clinical predictions and it can prove to be valuable in clinical diagnostics and precision medicine.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes