Gradient-Free Noise Optimization for Reward Alignment in Generative Models

Jeongsol Kim, Hongeun Kim, Jian Wang, Jong Chul Ye

arXiv:2605.1134794.4

AI Analysis

Provides a gradient-free method for reward alignment in deterministic generative models, addressing a key limitation of existing approaches.

ZeNO enables gradient-free noise optimization for reward alignment in generative models, achieving strong performance across diverse generators and reward functions, including protein structure generation where backpropagation is infeasible.

Existing reward alignment methods for diffusion and flow models rely on multi-step stochastic trajectories, making them difficult to extend to deterministic generators. A natural alternative is noise-space optimization, but existing approaches require backpropagation through the generator and reward pipeline, limiting applicability to differentiable settings. To address this, here we present ZeNO (Zeroth-order Noise Optimization), a gradient-free framework that formulates noise optimization as a path-integral control problem, estimable from zeroth-order reward evaluations alone. When instantiated with an Ornstein--Uhlenbeck reference process, the update connects to Langevin dynamics implicitly targeting a reward-tilted distribution. ZeNO enables effective inference-time scaling and demonstrates strong performance across diverse generators and reward functions, including a protein structure generation task where backpropagation is infeasible.

View on arXiv PDF

Similar