Amortized Molecular Optimization via Group Relative Policy Optimization
This work provides an amortized method for molecular design, improving efficiency for tasks like structural alteration, though it is incremental as it builds on existing model-based approaches.
The paper tackled the problem of inefficient molecular optimization by addressing the high variance from heterogeneous starting structures, introducing GRXForm with Group Relative Policy Optimization to achieve competitive multi-objective optimization scores without inference-time oracle calls.
Molecular design encompasses tasks ranging from de-novo design to structural alteration of given molecules or fragments. For the latter, state-of-the-art methods predominantly function as "Instance Optimizers'', expending significant compute restarting the search for every input structure. While model-based approaches theoretically offer amortized efficiency by learning a policy transferable to unseen structures, existing methods struggle to generalize. We identify a key failure mode: the high variance arising from the heterogeneous difficulty of distinct starting structures. To address this, we introduce GRXForm, adapting a pre-trained Graph Transformer model that optimizes molecules via sequential atom-and-bond additions. We employ Group Relative Policy Optimization (GRPO) for goal-directed fine-tuning to mitigate variance by normalizing rewards relative to the starting structure. Empirically, GRXForm generalizes to out-of-distribution molecular scaffolds without inference-time oracle calls or refinement, achieving scores in multi-objective optimization competitive with leading instance optimizers.