BMLGNov 2, 2023

A novel RNA pseudouridine site prediction model using Utility Kernel and data-driven parameters

arXiv:2311.16132v1h-index: 8
Originality Incremental advance
AI Analysis

This work addresses a domain-specific challenge in bioinformatics for researchers studying RNA modifications, offering an incremental improvement in prediction accuracy.

The paper tackled the problem of predicting pseudouridine sites in RNA sequences with limited data by proposing a Support Vector Machine kernel based on utility theory and data-driven features, achieving a 10-15% average improvement over existing state-of-the-art models.

RNA protein Interactions (RPIs) play an important role in biological systems. Recently, we have enumerated the RPIs at the residue level and have elucidated the minimum structural unit (MSU) in these interactions to be a stretch of five residues (Nucleotides/amino acids). Pseudouridine is the most frequent modification in RNA. The conversion of uridine to pseudouridine involves interactions between pseudouridine synthase and RNA. The existing models to predict the pseudouridine sites in a given RNA sequence mainly depend on user-defined features such as mono and dinucleotide composition/propensities of RNA sequences. Predicting pseudouridine sites is a non-linear classification problem with limited data points. Deep Learning models are efficient discriminators when the data set size is reasonably large and fail when there is a paucity of data ($<1000$ samples). To mitigate this problem, we propose a Support Vector Machine (SVM) Kernel based on utility theory from Economics, and using data-driven parameters (i.e. MSU) as features. For this purpose, we have used position-specific tri/quad/pentanucleotide composition/propensity (PSPC/PSPP) besides nucleotide and dineculeotide composition as features. SVMs are known to work well in small data regimes and kernels in SVM are designed to classify non-linear data. The proposed model outperforms the existing state-of-the-art models significantly (10%-15% on average).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes