CVApr 1, 2020

Future Video Synthesis with Object Motion Prediction

arXiv:2004.00542v2100 citations
AI Analysis

This work addresses video synthesis for applications like autonomous driving or surveillance by improving prediction quality, though it appears incremental as it builds on existing decoupling approaches.

The paper tackles the problem of predicting future video frames by decoupling background and moving objects, using non-rigid deformation for the background and affine transformation for objects, resulting in reduced tearing or distortion artifacts and outperforming state-of-the-art methods on Cityscapes and KITTI datasets in visual quality and accuracy.

We present an approach to predict future video frames given a sequence of continuous video frames in the past. Instead of synthesizing images directly, our approach is designed to understand the complex scene dynamics by decoupling the background scene and moving objects. The appearance of the scene components in the future is predicted by non-rigid deformation of the background and affine transformation of moving objects. The anticipated appearances are combined to create a reasonable video in the future. With this procedure, our method exhibits much less tearing or distortion artifact compared to other approaches. Experimental results on the Cityscapes and KITTI datasets show that our model outperforms the state-of-the-art in terms of visual quality and accuracy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes