CVNov 23, 2025

Point-to-Point: Sparse Motion Guidance for Controllable Video Editing

Yeji Song, Jaehyun Lee, Mijin Koo, JunHoo Lee, Nojun Kwak

arXiv:2511.18277v1

Originality Incremental advance

AI Analysis

This work addresses the problem of motion preservation in video editing for users needing high-fidelity edits, representing an incremental improvement over existing methods by refining motion representation.

The paper tackles the challenge of preserving motion while editing subjects in videos by introducing anchor tokens, a novel motion representation that captures essential motion patterns using a video diffusion model's prior, leading to more controllable and semantically aligned edits with superior performance in edit and motion fidelity.

Accurately preserving motion while editing a subject remains a core challenge in video editing tasks. Existing methods often face a trade-off between edit and motion fidelity, as they rely on motion representations that are either overfitted to the layout or only implicitly defined. To overcome this limitation, we revisit point-based motion representation. However, identifying meaningful points remains challenging without human input, especially across diverse video scenarios. To address this, we propose a novel motion representation, anchor tokens, that capture the most essential motion patterns by leveraging the rich prior of a video diffusion model. Anchor tokens encode video dynamics compactly through a small number of informative point trajectories and can be flexibly relocated to align with new subjects. This allows our method, Point-to-Point, to generalize across diverse scenarios. Extensive experiments demonstrate that anchor tokens lead to more controllable and semantically aligned video edits, achieving superior performance in terms of edit and motion fidelity.

View on arXiv PDF

Similar