LGAICLGNMar 20, 2025

Gene42: Long-Range Genomic Foundation Model With Dense Attention

arXiv:2503.16565v12 citationsh-index: 25Has Code
Originality Highly original
AI Analysis

This work addresses the problem of processing extensive genomic data for researchers in genomics and bioinformatics, representing a novel method for a known bottleneck rather than an incremental improvement.

The authors tackled the challenge of modeling long-range genomic sequences up to 192,000 base pairs at single-nucleotide resolution, achieving state-of-the-art performance on tasks like biotype classification and variant pathogenicity prediction with notably low perplexity and high reconstruction accuracy.

We introduce Gene42, a novel family of Genomic Foundation Models (GFMs) designed to manage context lengths of up to 192,000 base pairs (bp) at a single-nucleotide resolution. Gene42 models utilize a decoder-only (LLaMA-style) architecture with a dense self-attention mechanism. Initially trained on fixed-length sequences of 4,096 bp, our models underwent continuous pretraining to extend the context length to 192,000 bp. This iterative extension allowed for the comprehensive processing of large-scale genomic data and the capture of intricate patterns and dependencies within the human genome. Gene42 is the first dense attention model capable of handling such extensive long context lengths in genomics, challenging state-space models that often rely on convolutional operators among other mechanisms. Our pretrained models exhibit notably low perplexity values and high reconstruction accuracy, highlighting their strong ability to model genomic data. Extensive experiments on various genomic benchmarks have demonstrated state-of-the-art performance across multiple tasks, including biotype classification, regulatory region identification, chromatin profiling prediction, variant pathogenicity prediction, and species classification. The models are publicly available at huggingface.co/inceptionai.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes