LGCVMLJun 23, 2020

Learning Disentangled Representations of Video with Missing Data

arXiv:2006.13391v217 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the challenge of handling missing data in video analysis for applications like object tracking, though it appears incremental by building on existing disentanglement and imputation methods.

The paper tackles the problem of learning video representations with missing data by introducing DIVE, a deep generative model that imputes missing frames and predicts future ones, achieving substantial improvements over state-of-the-art baselines on moving MNIST and real-world pedestrian datasets.

Missing data poses significant challenges while learning representations of video sequences. We present Disentangled Imputed Video autoEncoder (DIVE), a deep generative model that imputes and predicts future video frames in the presence of missing data. Specifically, DIVE introduces a missingness latent variable, disentangles the hidden video representations into static and dynamic appearance, pose, and missingness factors for each object. DIVE imputes each object's trajectory where data is missing. On a moving MNIST dataset with various missing scenarios, DIVE outperforms the state of the art baselines by a substantial margin. We also present comparisons for real-world MOTSChallenge pedestrian dataset, which demonstrates the practical value of our method in a more realistic setting. Our code and data can be found at https://github.com/Rose-STL-Lab/DIVE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes