CVOct 14, 2025

VIDMP3: Video Editing by Representing Motion with Pose and Position Priors

arXiv:2510.12069v1h-index: 116Has Code2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
Originality Incremental advance
AI Analysis

This addresses a domain-specific need for video creators by enabling more flexible and consistent editing, though it is incremental as it builds on diffusion-based methods.

The paper tackles the problem of motion-preserved video editing for flexible object swapping, introducing VidMP3 which uses pose and position priors to learn motion representations, resulting in improved temporal consistency and reduced identity drift compared to existing methods.

Motion-preserved video editing is crucial for creators, particularly in scenarios that demand flexibility in both the structure and semantics of swapped objects. Despite its potential, this area remains underexplored. Existing diffusion-based editing methods excel in structure-preserving tasks, using dense guidance signals to ensure content integrity. While some recent methods attempt to address structure-variable editing, they often suffer from issues such as temporal inconsistency, subject identity drift, and the need for human intervention. To address these challenges, we introduce VidMP3, a novel approach that leverages pose and position priors to learn a generalized motion representation from source videos. Our method enables the generation of new videos that maintain the original motion while allowing for structural and semantic flexibility. Both qualitative and quantitative evaluations demonstrate the superiority of our approach over existing methods. The code will be made publicly available at https://github.com/sandeep-sm/VidMP3.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes