BMLGMay 22

An accurate nucleic acid-small molecule docking framework via geometric deep learning with large-scale pretraining

arXiv:2606.0519862.5
Predicted impact top 35% in BM · last 90 daysOriginality Incremental advance
AI Analysis

Provides an accurate and efficient docking method for nucleic acid targets, addressing data scarcity in a domain that lags behind protein-focused drug discovery.

NucleoDock, a deep learning framework for nucleic acid-small molecule docking, achieves a top-1 success rate of 56% at 2.0 Å RMSD on 125 complexes, outperforming rDock (29%), and generates 100 poses in ~5 seconds per complex.

Nucleic acids are increasingly recognized as therapeutic targets beyond conventional protein-centered drug discovery, yet accurate and efficient docking of small molecules to nucleic acid structures remains challenging. Physics-based docking methods often show limited accuracy and efficiency, whereas deep learning approaches are constrained by the scarcity of experimentally resolved nucleic acid-ligand complexes. Here, we present NucleoDock, a deep learning framework for nucleic acid-small molecule docking. To address data scarcity, NucleoDock combines physics-guided large-scale pretraining on millions of docking-generated synthetic complexes with fine-tuning on curated experimental co-crystal structures. It further integrates sequence- and structure-informed nucleotide representations with atomistic three-dimensional features to capture both biological context and binding-site geometry. A mixture density network-based geometric scoring head is used to model conditional interaction-distance distributions for pose ranking. On an external benchmark of 125 nucleic acid-ligand complexes, NucleoDock achieved a top-1 success rate of 56 percent at an RMSD cutoff of 2.0 Angstrom, outperforming rDock with 29 percent, while generating 100 poses in approximately 5 seconds per complex. Retrospective virtual screening on the ROBIN benchmark further showed improved early enrichment. NucleoDock represents a step toward bridging the methodological gap between protein- and nucleic acid-directed computational drug discovery.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes