Mol-MoE: Training Preference-Guided Routers for Molecule Generation
This addresses the need for flexible multi-objective optimization in real-world drug design, offering a more efficient alternative to traditional methods, though it is incremental as it builds on existing mixture-of-experts and preference-based techniques.
The paper tackles the problem of molecule generation for drug design, where existing methods are limited by single-objective reinforcement learning or costly retraining for multi-objective optimization, and introduces Mol-MoE, a mixture-of-experts architecture that enables efficient test-time steering without retraining, achieving superior sample quality and steerability in benchmarks.
Recent advances in language models have enabled framing molecule generation as sequence modeling. However, existing approaches often rely on single-objective reinforcement learning, limiting their applicability to real-world drug design, where multiple competing properties must be optimized. Traditional multi-objective reinforcement learning (MORL) methods require costly retraining for each new objective combination, making rapid exploration of trade-offs impractical. To overcome these limitations, we introduce Mol-MoE, a mixture-of-experts (MoE) architecture that enables efficient test-time steering of molecule generation without retraining. Central to our approach is a preference-based router training objective that incentivizes the router to combine experts in a way that aligns with user-specified trade-offs. This provides improved flexibility in exploring the chemical property space at test time, facilitating rapid trade-off exploration. Benchmarking against state-of-the-art methods, we show that Mol-MoE achieves superior sample quality and steerability.