CVAICLHCLGMar 16, 2023

HIVE: Harnessing Human Feedback for Instructional Visual Editing

Apple
arXiv:2303.09618v2187 citationsh-index: 112
AI Analysis

This addresses the issue of user misalignment in instructional image editing models, though it is incremental as it adapts existing human feedback methods from text to visual domains.

The paper tackles the problem of aligning instructional image editing models with human preferences by introducing HIVE, a framework that collects human feedback to learn a reward function and fine-tune diffusion models, resulting in a significant improvement over previous state-of-the-art approaches.

Incorporating human feedback has been shown to be crucial to align text generated by large language models to human preferences. We hypothesize that state-of-the-art instructional image editing models, where outputs are generated based on an input image and an editing instruction, could similarly benefit from human feedback, as their outputs may not adhere to the correct instructions and preferences of users. In this paper, we present a novel framework to harness human feedback for instructional visual editing (HIVE). Specifically, we collect human feedback on the edited images and learn a reward function to capture the underlying user preferences. We then introduce scalable diffusion model fine-tuning methods that can incorporate human preferences based on the estimated reward. Besides, to mitigate the bias brought by the limitation of data, we contribute a new 1M training dataset, a 3.6K reward dataset for rewards learning, and a 1K evaluation dataset to boost the performance of instructional image editing. We conduct extensive empirical experiments quantitatively and qualitatively, showing that HIVE is favored over previous state-of-the-art instructional image editing approaches by a large margin.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes