MolChord: Structure-Sequence Alignment for Protein-Guided Drug Design
This work addresses a critical problem in drug discovery for researchers and practitioners by improving the alignment between generated drugs and their pharmacological properties, though it appears incremental as it builds on existing techniques like autoregressive models and DPO.
The paper tackles the challenge of aligning protein structural representations with molecular representations in structure-based drug design by proposing MolChord, which integrates an autoregressive model and a diffusion-based encoder for generation and uses Direct Preference Optimization for property guidance, achieving state-of-the-art performance on CrossDocked2020 with key evaluation metrics.
Structure-based drug design (SBDD), which maps target proteins to candidate molecular ligands, is a fundamental task in drug discovery. Effectively aligning protein structural representations with molecular representations, and ensuring alignment between generated drugs and their pharmacological properties, remains a critical challenge. To address these challenges, we propose MolChord, which integrates two key techniques: (1) to align protein and molecule structures with their textual descriptions and sequential representations (e.g., FASTA for proteins and SMILES for molecules), we leverage NatureLM, an autoregressive model unifying text, small molecules, and proteins, as the molecule generator, alongside a diffusion-based structure encoder; and (2) to guide molecules toward desired properties, we curate a property-aware dataset by integrating preference data and refine the alignment process using Direct Preference Optimization (DPO). Experimental results on CrossDocked2020 demonstrate that our approach achieves state-of-the-art performance on key evaluation metrics, highlighting its potential as a practical tool for SBDD.