AIApr 16

Intermediate Layers Encode Optimal Biological Representations in Single-Cell Foundation Models

arXiv:2604.1483853.4h-index: 37
Predicted impact top 70% in AI · last 90 daysOriginality Synthesis-oriented
AI Analysis

For researchers using single-cell foundation models, this work demonstrates that layer selection is critical for performance, providing a simple but overlooked improvement over current practices.

This paper shows that optimal feature representations in single-cell foundation models are task- and context-dependent, with intermediate layers outperforming final layers by up to 31% in trajectory inference and perturbation prediction, challenging the default use of final-layer embeddings.

Current single-cell foundation model benchmarks universally extract final layer embeddings, assuming these represent optimal feature spaces. We systematically evaluate layer-wise representations from scFoundation (100M parameters) and Tahoe-X1 (1.3B parameters) across trajectory inference and perturbation response prediction. Our analysis reveals that optimal layers are task-dependent (trajectory peaks at 60% depth, 31% above final layers) and context-dependent (perturbation optima shift 0-96% across T cell activation states). Notably, first-layer embeddings outperform all deeper layers in quiescent cells, challenging assumptions about hierarchical feature abstraction. These findings demonstrate that "where" to extract features matters as much as "what" the model learns, necessitating systematic layer evaluation tailored to biological task and cellular context rather than defaulting to final-layer embeddings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes