Kernel Density Steering: Inference-Time Scaling via Mode Seeking for Image Restoration
This addresses image restoration quality issues for users of diffusion models, offering a plug-and-play solution without retraining, though it is incremental as it builds on existing diffusion methods.
The paper tackles inconsistent fidelity and artifacts in diffusion models for image restoration by introducing Kernel Density Steering (KDS), an inference-time framework that uses an ensemble of samples to steer patches toward higher-density regions, resulting in substantial improvements in performance on super-resolution and inpainting tasks.
Diffusion models show promise for image restoration, but existing methods often struggle with inconsistent fidelity and undesirable artifacts. To address this, we introduce Kernel Density Steering (KDS), a novel inference-time framework promoting robust, high-fidelity outputs through explicit local mode-seeking. KDS employs an $N$-particle ensemble of diffusion samples, computing patch-wise kernel density estimation gradients from their collective outputs. These gradients steer patches in each particle towards shared, higher-density regions identified within the ensemble. This collective local mode-seeking mechanism, acting as "collective wisdom", steers samples away from spurious modes prone to artifacts, arising from independent sampling or model imperfections, and towards more robust, high-fidelity structures. This allows us to obtain better quality samples at the expense of higher compute by simultaneously sampling multiple particles. As a plug-and-play framework, KDS requires no retraining or external verifiers, seamlessly integrating with various diffusion samplers. Extensive numerical validations demonstrate KDS substantially improves both quantitative and qualitative performance on challenging real-world super-resolution and image inpainting tasks.