GNAICELGMay 19, 2025

ChromFound: Towards A Universal Foundation Model for Single-Cell Chromatin Accessibility Data

arXiv:2505.12638v32 citationsh-index: 8
Originality Incremental advance
AI Analysis

This provides a universal framework for analyzing scATAC-seq data, enabling zero-shot cell identification and multi-omics analysis, which is incremental as it adapts foundation model concepts to a new domain.

The paper tackles the lack of a foundation model for single-cell chromatin accessibility data (scATAC-seq) by introducing ChromFound, which achieves robust zero-shot performance in generating universal cell representations and demonstrates excellent transferability across tasks like cell type annotation and cross-omics prediction.

The advent of single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) offers an innovative perspective for deciphering regulatory mechanisms by assembling a vast repository of single-cell chromatin accessibility data. While foundation models have achieved significant success in single-cell transcriptomics, there is currently no foundation model for scATAC-seq that supports zero-shot high-quality cell identification and comprehensive multi-omics analysis simultaneously. Key challenges lie in the high dimensionality and sparsity of scATAC-seq data, as well as the lack of a standardized schema for representing open chromatin regions (OCRs). Here, we present ChromFound, a foundation model tailored for scATAC-seq. ChromFound utilizes a hybrid architecture and genome-aware tokenization to effectively capture genome-wide long contexts and regulatory signals from dynamic chromatin landscapes. Pretrained on 1.97 million cells from 30 tissues and 6 disease conditions, ChromFound demonstrates broad applicability across 6 diverse tasks. Notably, it achieves robust zero-shot performance in generating universal cell representations and exhibits excellent transferability in cell type annotation and cross-omics prediction. By uncovering enhancer-gene links undetected by existing computational methods, ChromFound offers a promising framework for understanding disease risk variants in the noncoding genome.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes