LGJul 11, 2025

CircFormerMoE: An End-to-End Deep Learning Framework for Circular RNA Splice Site Detection and Pairing in Plant Genomes

arXiv:2507.08542v1
Originality Highly original
AI Analysis

This provides a computational tool for plant biologists to efficiently discover circRNAs, addressing a bottleneck in plant functional genomics where existing methods are inefficient and lack generalization.

The paper tackles the problem of identifying circular RNAs (circRNAs) in plant genomes by developing CircFormerMoE, a deep learning framework that predicts circRNAs directly from genomic DNA sequences without requiring RNA-seq data, achieving fast and accurate large-scale prediction validated across 10 plant species.

Circular RNAs (circRNAs) are important components of the non-coding RNA regulatory network. Previous circRNA identification primarily relies on high-throughput RNA sequencing (RNA-seq) data combined with alignment-based algorithms that detect back-splicing signals. However, these methods face several limitations: they can't predict circRNAs directly from genomic DNA sequences and relies heavily on RNA experimental data; they involve high computational costs due to complex alignment and filtering steps; and they are inefficient for large-scale or genome-wide circRNA prediction. The challenge is even greater in plants, where plant circRNA splice sites often lack the canonical GT-AG motif seen in human mRNA splicing, and no efficient deep learning model with strong generalization capability currently exists. Furthermore, the number of currently identified plant circRNAs is likely far lower than their true abundance. In this paper, we propose a deep learning framework named CircFormerMoE based on transformers and mixture-of experts for predicting circRNAs directly from plant genomic DNA. Our framework consists of two subtasks known as splicing site detection (SSD) and splicing site pairing (SSP). The model's effectiveness has been validated on gene data of 10 plant species. Trained on known circRNA instances, it is also capable of discovering previously unannotated circRNAs. In addition, we performed interpretability analyses on the trained model to investigate the sequence patterns contributing to its predictions. Our framework provides a fast and accurate computational method and tool for large-scale circRNA discovery in plants, laying a foundation for future research in plant functional genomics and non-coding RNA annotation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes