QM LGDec 2, 2025

Molecular Embedding-Based Algorithm Selection in Protein-Ligand Docking

Jiabao Brad Wang, Siyuan Cao, Hongxuan Wu, Yiliang Yuan, Mustafa Misir

arXiv:2512.02328v11.2h-index: 2

Originality Incremental advance

AI Analysis

This addresses the challenge of algorithm selection in protein-ligand docking for computational biology and drug discovery, but it is incremental as it builds on existing embedding methods and focuses on specific conditions.

The paper tackled the problem of selecting effective docking algorithms for protein-ligand docking, which is context-dependent, by introducing MolAS, a lightweight system that uses pretrained embeddings to predict per-algorithm performance, achieving up to 15% absolute improvement over the single-best solver and closing 17-66% of the gap to the virtual best solver across benchmarks.

Selecting an effective docking algorithm is highly context-dependent, and no single method performs reliably across structural, chemical, or protocol regimes. We introduce MolAS, a lightweight algorithm selection system that predicts per-algorithm performance from pretrained protein-ligand embeddings using attentional pooling and a shallow residual decoder. With only hundreds to a few thousand labelled complexes, MolAS achieves up to 15% absolute improvement over the single-best solver (SBS) and closes 17-66% of the Virtual Best Solver (VBS)-SBS gap across five diverse docking benchmarks. Analyses of reliability, embedding geometry, and solver-selection patterns show that MolAS succeeds when the oracle landscape exhibits low entropy and separable solver behaviour, but collapses under protocol-induced hierarchy shifts. These findings indicate that the main barrier to robust docking AS is not representational capacity but instability in solver rankings across pose-generation regimes, positioning MolAS as both a practical in-domain selector and a diagnostic tool for assessing when AS is feasible.

View on arXiv PDF

Similar