AIGNFeb 19

JEPA-DNA: Grounding Genomic Foundation Models through Joint-Embedding Predictive Architectures

arXiv:2602.17162v17 citationsh-index: 25
Originality Incremental advance
AI Analysis

This work addresses the need for more biologically grounded genomic foundation models, offering a scalable path for better understanding genomic sequences, though it appears incremental as it extends existing paradigms.

The paper tackled the problem that genomic foundation models often fail to capture broader functional context, and introduced JEPA-DNA, which integrates joint-embedding predictive architectures with generative objectives to improve performance on genomic benchmarks, yielding superior results in supervised and zero-shot tasks compared to baselines.

Genomic Foundation Models (GFMs) have largely relied on Masked Language Modeling (MLM) or Next Token Prediction (NTP) to learn the language of life. While these paradigms excel at capturing local genomic syntax and fine-grained motif patterns, they often fail to capture the broader functional context, resulting in representations that lack a global biological perspective. We introduce JEPA-DNA, a novel pre-training framework that integrates the Joint-Embedding Predictive Architecture (JEPA) with traditional generative objectives. JEPA-DNA introduces latent grounding by coupling token-level recovery with a predictive objective in the latent space by supervising a CLS token. This forces the model to predict the high-level functional embeddings of masked genomic segments rather than focusing solely on individual nucleotides. JEPA-DNA extends both NTP and MLM paradigms and can be deployed either as a standalone from-scratch objective or as a continual pre-training enhancement for existing GFMs. Our evaluations across a diverse suite of genomic benchmarks demonstrate that JEPA-DNA consistently yields superior performance in supervised and zero-shot tasks compared to generative-only baselines. By providing a more robust and biologically grounded representation, JEPA-DNA offers a scalable path toward foundation models that understand not only the genomic alphabet, but also the underlying functional logic of the sequence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes