Deformable Sprites for Unsupervised Video Decomposition
This method addresses video decomposition for unsupervised editing, offering a novel approach that is incremental in improving flexibility over prior work.
The paper tackles the problem of extracting persistent elements from dynamic scenes in videos by introducing Deformable Sprites, which represent each element with a 2D texture, per-frame masks, and non-rigid deformations, enabling applications like consistent video editing without requiring large datasets or pre-trained models.
We describe a method to extract persistent elements of a dynamic scene from an input video. We represent each scene element as a \emph{Deformable Sprite} consisting of three components: 1) a 2D texture image for the entire video, 2) per-frame masks for the element, and 3) non-rigid deformations that map the texture image into each video frame. The resulting decomposition allows for applications such as consistent video editing. Deformable Sprites are a type of video auto-encoder model that is optimized on individual videos, and does not require training on a large dataset, nor does it rely on pre-trained models. Moreover, our method does not require object masks or other user input, and discovers moving objects of a wider variety than previous work. We evaluate our approach on standard video datasets and show qualitative results on a diverse array of Internet videos. Code and video results can be found at https://deformable-sprites.github.io