GNAIJun 22, 2025

eccDNAMamba: A Pre-Trained Model for Ultra-Long eccDNA Sequence Analysis

arXiv:2506.18940v11 citationsh-index: 1Has Code
Originality Highly original
AI Analysis

This provides a robust and efficient framework for modeling circular genomes, addressing a specific bottleneck in genomic analysis for cancer research.

The paper tackles the lack of pre-trained models for analyzing full-length circular extrachromosomal circular DNA (eccDNA) sequences, which are important in cancer regulation, by introducing eccDNAMamba, a bidirectional state-space encoder that achieves strong classification performance and scales to sequences up to 200 Kbp.

Extrachromosomal circular DNA (eccDNA) plays key regulatory roles and contributes to oncogene overexpression in cancer through high-copy amplification and long-range interactions. Despite advances in modeling, no pre-trained models currently support full-length circular eccDNA for downstream analysis. Existing genomic models are either limited to single-nucleotide resolution or hindered by the inefficiency of the quadratic attention mechanism. Here, we introduce eccDNAMamba, the first bidirectional state-space encoder tailored for circular DNA sequences. It combines forward and reverse passes for full-context representation learning with linear-time complexity, and preserves circular structure through a novel augmentation strategy. Tested on two real-world datasets, eccDNAMamba achieves strong classification performance and scales to sequences up to 200 Kbp, offering a robust and efficient framework for modeling circular genomes. Our codes are available at https://github.com/zzq1zh/GenAI-Lab.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes