Matcha: Multi-Stage Riemannian Flow Matching for Accurate and Physically Valid Molecular Docking
This work addresses the challenge of balancing speed, accuracy, and physical validity in molecular docking for structure-based drug design, representing a novel method for a known bottleneck.
The paper tackled the problem of accurately predicting protein-ligand binding poses for drug design by introducing Matcha, a multi-stage flow matching pipeline that achieved superior docking success rates and physical plausibility on Astex and PDBbind test sets while being approximately 25 times faster than large-scale co-folding models.
Accurate prediction of protein-ligand binding poses is crucial for structure-based drug design, yet existing methods struggle to balance speed, accuracy, and physical plausibility. We introduce Matcha, a novel molecular docking pipeline that combines multi-stage flow matching with learned scoring and physical validity filtering. Our approach consists of three sequential stages applied consecutively to refine docking predictions, each implemented as a flow matching model operating on appropriate geometric spaces ($\mathbb{R}^3$, $\mathrm{SO}(3)$, and $\mathrm{SO}(2)$). We enhance the prediction quality through a dedicated scoring model and apply unsupervised physical validity filters to eliminate unrealistic poses. Compared to various approaches, Matcha demonstrates superior performance on Astex and PDBbind test sets in terms of docking success rate and physical plausibility. Moreover, our method works approximately 25 times faster than modern large-scale co-folding models. The model weights and inference code to reproduce our results are available at https://github.com/LigandPro/Matcha.