LG CV MLJun 23, 2020

Learning Disentangled Representations of Video with Missing Data

Armand Comas-Massagué, Chi Zhang, Zlatan Feric, Octavia Camps, Rose Yu

arXiv:2006.13391v27.217 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of handling missing data in video analysis for applications like object tracking, though it appears incremental by building on existing disentanglement and imputation methods.

The paper tackles the problem of learning video representations with missing data by introducing DIVE, a deep generative model that imputes missing frames and predicts future ones, achieving substantial improvements over state-of-the-art baselines on moving MNIST and real-world pedestrian datasets.

Missing data poses significant challenges while learning representations of video sequences. We present Disentangled Imputed Video autoEncoder (DIVE), a deep generative model that imputes and predicts future video frames in the presence of missing data. Specifically, DIVE introduces a missingness latent variable, disentangles the hidden video representations into static and dynamic appearance, pose, and missingness factors for each object. DIVE imputes each object's trajectory where data is missing. On a moving MNIST dataset with various missing scenarios, DIVE outperforms the state of the art baselines by a substantial margin. We also present comparisons for real-world MOTSChallenge pedestrian dataset, which demonstrates the practical value of our method in a more realistic setting. Our code and data can be found at https://github.com/Rose-STL-Lab/DIVE.

View on arXiv PDF Code

Similar