94.8CVJun 2Code
Follow-Your-Preference++: Rethinking Preference Alignment for Image InpaintingJunkun Yuan, Yutao Shen, Toru Aonishi et al.
We study preference alignment for image inpainting. Rather than proposing yet another method, we revisit the problem from first principles and reassess its core challenges. We adopt the widely used direct preference optimization framework and construct preference training data with publicly available reward models. Our empirical study spans nine reward models, two benchmarks, and two baseline inpainting models that differ in architecture and generative mechanism. Our main findings are: (1) Most reward models provide valid signals for preference data construction, although some are unreliable as evaluators. (2) Across models and benchmarks, preference data exhibits consistent trends under both candidate and sample scaling. (3) Reward models display pronounced biases--particularly in brightness, composition, and color scheme--that make them prone to inducing reward hacking. (4) A simple ensemble of reward models mitigates such biases and yields robust, generalizable performance. {\color{rebuttal_blue}(5) Preference alignment is transferable to the object removal task, where the goal shifts from open-ended creative generation to coherent background completion. (6) Further analysis reveals that a calibrated ensemble method further mitigates hacking and improves robustness.} Without modifying model architectures or introducing additional datasets, our models substantially outperform prior state-of-the-art models on standard metrics, large vision-language model evaluations, and human assessments. Our code is available at: https://github.com/shenytzzz/Follow-Your-Preference.
CVSep 27, 2025Code
Follow-Your-Preference: Towards Preference-Aligned Image InpaintingYutao Shen, Junkun Yuan, Toru Aonishi et al. · tencent-ai
This paper investigates image inpainting with preference alignment. Instead of introducing a novel method, we go back to basics and revisit fundamental problems in achieving such alignment. We leverage the prominent direct preference optimization approach for alignment training and employ public reward models to construct preference training datasets. Experiments are conducted across nine reward models, two benchmarks, and two baseline models with varying structures and generative algorithms. Our key findings are as follows: (1) Most reward models deliver valid reward scores for constructing preference data, even if some of them are not reliable evaluators. (2) Preference data demonstrates robust trends in both candidate scaling and sample scaling across models and benchmarks. (3) Observable biases in reward models, particularly in brightness, composition, and color scheme, render them susceptible to cause reward hacking. (4) A simple ensemble of these models yields robust and generalizable results by mitigating such biases. Built upon these observations, our alignment models significantly outperform prior models across standard metrics, GPT-4 assessments, and human evaluations, without any changes to model structures or the use of new datasets. We hope our work can set a simple yet solid baseline, pushing this promising frontier. Our code is open-sourced at: https://github.com/shenytzzz/Follow-Your-Preference.
MTRL-SCIMay 24, 2023
Detection of Non-uniformity in Parameters for Magnetic Domain Pattern Generation by Machine LearningNaoya Mamada, Masaichiro Mizumaki, Ichiro Akai et al.
We estimate the spatial distribution of heterogeneous physical parameters involved in the formation of magnetic domain patterns of polycrystalline thin films by using convolutional neural networks. We propose a method to obtain a spatial map of physical parameters by estimating the parameters from patterns within a small subregion window of the full magnetic domain and subsequently shifting this window. To enhance the accuracy of parameter estimation in such subregions, we employ large-scale models utilized for natural image classification and exploit the benefits of pretraining. Using a model with high estimation accuracy on these subregions, we conduct inference on simulation data featuring spatially varying parameters and demonstrate the capability to detect such parameter variations.