CVAIAug 19, 2021

Click to Move: Controlling Video Generation with Sparse Motion

arXiv:2108.08815v116 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the need for interactive video synthesis tools, though it is incremental as it builds on prior motion control methods.

The paper tackles the problem of user-controlled video generation by introducing Click to Move (C2M), a framework that allows users to specify object trajectories via mouse clicks, and it outperforms existing methods on two public datasets.

This paper introduces Click to Move (C2M), a novel framework for video generation where the user can control the motion of the synthesized video through mouse clicks specifying simple object trajectories of the key objects in the scene. Our model receives as input an initial frame, its corresponding segmentation map and the sparse motion vectors encoding the input provided by the user. It outputs a plausible video sequence starting from the given frame and with a motion that is consistent with user input. Notably, our proposed deep architecture incorporates a Graph Convolution Network (GCN) modelling the movements of all the objects in the scene in a holistic manner and effectively combining the sparse user motion information and image features. Experimental results show that C2M outperforms existing methods on two publicly available datasets, thus demonstrating the effectiveness of our GCN framework at modelling object interactions. The source code is publicly available at https://github.com/PierfrancescoArdino/C2M.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes