Disentangling Video with Independent Prediction
This addresses the challenge of interpretable video analysis for computer vision researchers, but appears incremental as it builds on existing variational models.
The paper tackles the problem of unsupervised disentanglement of video into independent factors, where each factor's future can be predicted from its past without considering others, and shows that the approach often learns interpretable factors as objects in a scene.
We propose an unsupervised variational model for disentangling video into independent factors, i.e. each factor's future can be predicted from its past without considering the others. We show that our approach often learns factors which are interpretable as objects in a scene.