LGQMMay 10, 2025

GBDTSVM: Combined Support Vector Machine and Gradient Boosting Decision Tree Framework for efficient snoRNA-disease association prediction

arXiv:2505.06534v16 citationsh-index: 3Has CodeComput. Biol. Medicine
Originality Incremental advance
AI Analysis

This provides a computational tool for researchers in bioinformatics to efficiently identify snoRNA-disease links, reducing reliance on costly experimental methods, but it is incremental as it builds on existing machine learning techniques.

The paper tackled the problem of predicting snoRNA-disease associations by proposing GBDTSVM, a machine learning model combining Gradient Boosting Decision Tree and Support Vector Machine, which achieved an AUROC of 0.96 and AUPRC of 0.95 on the MDRF dataset, outperforming state-of-the-art methods.

Small nucleolar RNAs (snoRNAs) are increasingly recognized for their critical role in the pathogenesis and characterization of various human diseases. Consequently, the precise identification of snoRNA-disease associations (SDAs) is essential for the progression of diseases and the advancement of treatment strategies. However, conventional biological experimental approaches are costly, time-consuming, and resource-intensive; therefore, machine learning-based computational methods offer a promising solution to mitigate these limitations. This paper proposes a model called 'GBDTSVM', representing a novel and efficient machine learning approach for predicting snoRNA-disease associations by leveraging a Gradient Boosting Decision Tree (GBDT) and Support Vector Machine (SVM). 'GBDTSVM' effectively extracts integrated snoRNA-disease feature representations utilizing GBDT and SVM is subsequently utilized to classify and identify potential associations. Furthermore, the method enhances the accuracy of these predictions by incorporating Gaussian kernel profile similarity for both snoRNAs and diseases. Experimental evaluation of the GBDTSVM model demonstrated superior performance compared to state-of-the-art methods in the field, achieving an area under the receiver operating characteristic (AUROC) of 0.96 and an area under the precision-recall curve (AUPRC) of 0.95 on MDRF dataset. Moreover, our model shows superior performance on two more datasets named LSGT and PsnoD. Additionally, a case study on the predicted snoRNA-disease associations verified the top 10 predicted snoRNAs across nine prevalent diseases, further validating the efficacy of the GBDTSVM approach. These results underscore the model's potential as a robust tool for advancing snoRNA-related disease research. Source codes and datasets our proposed framework can be obtained from: https://github.com/mariamuna04/gbdtsvm

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes