Spectral Evolution Search: Efficient Inference-Time Scaling for Reward-Aligned Image Generation
This work addresses the computational inefficiency in aligning visual generative models without parameter updates, which is an incremental advancement for researchers and practitioners in image generation.
The paper tackled the inefficiency of inference-time scaling for aligning image generation models by identifying a spectral bias in generative dynamics, and proposed Spectral Evolution Search (SES) to optimize initial noise in a low-frequency subspace, achieving significant improvements in generation quality versus computational cost.
Inference-time scaling offers a versatile paradigm for aligning visual generative models with downstream objectives without parameter updates. However, existing approaches that optimize the high-dimensional initial noise suffer from severe inefficiency, as many search directions exert negligible influence on the final generation. We show that this inefficiency is closely related to a spectral bias in generative dynamics: model sensitivity to initial perturbations diminishes rapidly as frequency increases. Building on this insight, we propose Spectral Evolution Search (SES), a plug-and-play framework for initial noise optimization that executes gradient-free evolutionary search within a low-frequency subspace. Theoretically, we derive the Spectral Scaling Prediction from perturbation propagation dynamics, which explains the systematic differences in the impact of perturbations across frequencies. Extensive experiments demonstrate that SES significantly advances the Pareto frontier of generation quality versus computational cost, consistently outperforming strong baselines under equivalent budgets.