LGGNJun 2, 2025

SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model

arXiv:2506.01833v14 citationsh-index: 2Has CodeICML
Originality Highly original
AI Analysis

This work addresses the challenge of learning effective DNA representations for genomics researchers, offering a novel approach that improves over existing unsupervised methods.

The paper tackles the problem of suboptimal DNA pre-training by proposing supervised training for genomic profile prediction as a more effective alternative, and introduces SPACE, a model that achieves state-of-the-art performance across various tasks.

Inspired by the success of unsupervised pre-training paradigms, researchers have applied these approaches to DNA pre-training. However, we argue that these approaches alone yield suboptimal results because pure DNA sequences lack sufficient information, since their functions are regulated by genomic profiles like chromatin accessibility. Here, we demonstrate that supervised training for genomic profile prediction serves as a more effective alternative to pure sequence pre-training. Furthermore, considering the multi-species and multi-profile nature of genomic profile prediction, we introduce our $\textbf{S}$pecies-$\textbf{P}$rofile $\textbf{A}$daptive $\textbf{C}$ollaborative $\textbf{E}$xperts (SPACE) that leverages Mixture of Experts (MoE) to better capture the relationships between DNA sequences across different species and genomic profiles, thereby learning more effective DNA representations. Through extensive experiments across various tasks, our model achieves state-of-the-art performance, establishing that DNA models trained with supervised genomic profiles serve as powerful DNA representation learners. The code is available at https://github.com/ZhuJiwei111/SPACE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes