BM LGMay 12

Learning Protein Structure-Function Relationships through Knowledge-guided Representation Decomposition

Mingqing Wang, Zhiwei Nie, Athanasios V. Vasilakos, Yonghong He, Zhixiang Ren

arXiv:2605.2396055.9Has Code

AI Analysis

Provides a general and interpretable approach for structuring latent spaces in protein structural modeling, addressing the problem of entangled representations that obscure biophysical signals.

ProtDiS decomposes pretrained protein micro-environment embeddings into biologically grounded dimensions using an information bottleneck principle, achieving consistent improvements across twelve downstream tasks with largest gains under structure-based splits.

Proteins encode diverse functions within complex three-dimensional structures, yet most deep learning representations remain highly entangled, obscuring the biophysical signals that underlie function. Here we introduce ProtDiS, a knowledge-guided framework that decomposes pretrained protein micro-environment embeddings into biologically grounded and task-relevant dimensions. Inspired by the information bottleneck principle, ProtDiS learns representations that balance informativeness and compression, yielding structural features that are more specific, independent, and information-efficient, and achieving consistent improvements across twelve downstream tasks, with the largest gains under structure-based splits. Protein- and residue-level analyses further show that ProtDiS differentiates proteins with similar folds but divergent functions and captures fine-grained biophysical signals critical. These findings suggest that knowledge-guided decomposition provides a general and interpretable approach for structuring latent spaces in protein structural modeling. The source code and implementation details are publicly available at https://github.com/AI-HPC-Research-Team/ProtDiS.

View on arXiv PDF Code

Similar