AI LG BMOct 21, 2024

Comprehensive benchmarking of large language models for RNA secondary structure prediction

L. I. Zablocki, L. A. Bugnon, M. Gerard, L. Di Persia, G. Stegmayer, D. H. Milone

arXiv:2410.16212v25.822 citationsh-index: 31Has CodeBriefings Bioinform.

Originality Synthesis-oriented

AI Analysis

This work provides a comparative analysis for researchers in computational biology, but it is incremental as it applies existing methods to a specific domain task.

The authors benchmarked several pre-trained large language models (LLM) for RNA secondary structure prediction, finding that two LLM outperformed others but faced challenges in low-homology scenarios.

Inspired by the success of large language models (LLM) for DNA and proteins, several LLM for RNA have been developed recently. RNA-LLM uses large datasets of RNA sequences to learn, in a self-supervised way, how to represent each RNA base with a semantically rich numerical vector. This is done under the hypothesis that obtaining high-quality RNA representations can enhance data-costly downstream tasks. Among them, predicting the secondary structure is a fundamental task for uncovering RNA functional mechanisms. In this work we present a comprehensive experimental analysis of several pre-trained RNA-LLM, comparing them for the RNA secondary structure prediction task in an unified deep learning framework. The RNA-LLM were assessed with increasing generalization difficulty on benchmark datasets. Results showed that two LLM clearly outperform the other models, and revealed significant challenges for generalization in low-homology scenarios.

View on arXiv PDF Code

Similar