CEJun 2

When Does Structure Help? The Information Bonus of AlphaFold2 Representations over Protein Language Models

arXiv:2606.0422818.7
Predicted impact top 68% in CE · last 90 daysOriginality Incremental advance
AI Analysis

Provides a measurable decision framework for selecting between structural and sequence-only representations in AI-for-science protein pipelines.

The authors introduce the information bonus (IB) metric to compare AlphaFold2 and ESM-2 representations across three protein tasks. They find that ESM-2 outperforms on binding affinity and flexibility, while AlphaFold2 only helps for allosteric site classification, and identify a leakage artifact that inflates performance.

AI scientist systems increasingly choose biological foundation models before they choose experiments. In protein pipelines, this creates a concrete engineering and scientific question: when is the cost of structural inference worth paying over a cheaper sequence-only model? We introduce the information bonus (IB), a task-level metric that measures the linearly accessible advantage of frozen single-sequence AlphaFold2 Evoformer representations over frozen ESM-2 embeddings under protein-level cross-validation. Across binding affinity regression (PDBbind, n=5,680), conformational flexibility (ATLAS molecular dynamics, 268 proteins), and allosteric-site classification (AlloSigDB, n=9,925 residues), IB is sharply mechanism-dependent. ESM-2 dominates binding affinity (IB=-0.141; Pearson r=0.449 vs. 0.307) and binary flexibility (IB=-0.060; AUROC 0.824 vs. 0.764; p=0.0017). AF2 single representations give the only above-chance allostery predictions (IB=+0.064; AUROC 0.548 vs. 0.485), revealing long-range geometric signal not recovered from sequence alone. We also identify a residue-level leakage artifact: naive residue splits inflate RMSF performance by 27-39% depending on the representation, enough to reverse representation rankings. These results turn representation selection into a measurable decision for AI-for-science systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes