CVJan 31, 2024

Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators

arXiv:2401.18085v146 citationsh-index: 6ICLR
Originality Incremental advance
AI Analysis

This addresses the challenge of fine-grained image editing for users of diffusion models, though it is an incremental improvement over existing methods.

The paper tackles the problem of precisely editing object layout, position, pose, and shape in images using diffusion models by proposing motion guidance, a zero-shot technique that uses optical flow networks to steer the sampling process, resulting in high-quality edits for complex motions.

Diffusion models are capable of generating impressive images conditioned on text descriptions, and extensions of these models allow users to edit images at a relatively coarse scale. However, the ability to precisely edit the layout, position, pose, and shape of objects in images with diffusion models is still difficult. To this end, we propose motion guidance, a zero-shot technique that allows a user to specify dense, complex motion fields that indicate where each pixel in an image should move. Motion guidance works by steering the diffusion sampling process with the gradients through an off-the-shelf optical flow network. Specifically, we design a guidance loss that encourages the sample to have the desired motion, as estimated by a flow network, while also being visually similar to the source image. By simultaneously sampling from a diffusion model and guiding the sample to have low guidance loss, we can obtain a motion-edited image. We demonstrate that our technique works on complex motions and produces high quality edits of real and generated images.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes