LGCVMay 29

Parallel Tempering Initial Sampling in Inference-Time Reward Alignment

arXiv:2605.309918.1h-index: 4
Predicted impact top 67% in LG · last 90 daysOriginality Incremental advance
AI Analysis

This work provides an incremental improvement for researchers and practitioners working on inference-time reward alignment for generative models, particularly those dealing with complex reward landscapes.

This paper addresses the challenge of poor initial sampling in inference-time reward alignment for generative models, where existing methods struggle to find high-reward regions or get trapped in local modes. The authors propose PATHS, a novel initialization method using parallel tempering, which significantly enhances exploration and achieves consistent gains in alignment quality, especially for complex prompts in layout-to-image and quantity-aware generation.

Inference-time reward alignment steers pretrained diffusion and flow-based generative models to satisfy user-specified rewards without retraining. Recently, Sequential Monte Carlo (SMC) has emerged as a powerful framework for this task by iteratively filtering and propagating multiple particles. However, we show that standard SMC-based methods often suffer from poor performance because they initialize particles from a standard prior, whereas high-reward regions in complex reward landscapes are extremely rare. Further, we show that even recent reward-aware initial sampling approaches remain vulnerable to getting trapped in local modes, as complex reward landscapes are often multi-modal. To overcome these limitations, we propose PATHS (PArallel Tempering for High-complexity reward Sampling), a novel initialization method that couples multiple sampling chains through parallel tempering. PATHS maintains a ladder of reward-tempered chains and periodically performs Metropolis swaps, enabling efficient exploration across flattened reward landscapes, thereby mitigating the mode-trapping issues. Our analysis reveals that this mechanism substantially enhances the finite-budget exploration of rare, high-reward regions that are typically challenging to sample. Experiments on layout-to-image and quantity-aware generation show that PATHS achieves consistent gains in alignment quality, particularly on complex prompts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes