LGCVMLApr 11, 2019

Keyframing the Future: Keyframe Discovery for Visual Prediction and Planning

arXiv:1904.05869v231 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of focusing on informative events in temporal data for applications like visual prediction and planning, representing an incremental improvement in hierarchical modeling.

The paper tackles the problem of identifying essential moments in videos by proposing a hierarchical Keyframe-Inpainter (KeyIn) model that discovers keyframes and inpaints intervening frames, showing it outperforms other hierarchical predictive models for planning across various datasets.

Temporal observations such as videos contain essential information about the dynamics of the underlying scene, but they are often interleaved with inessential, predictable details. One way of dealing with this problem is by focusing on the most informative moments in a sequence. We propose a model that learns to discover these important events and the times when they occur and uses them to represent the full sequence. We do so using a hierarchical Keyframe-Inpainter (KeyIn) model that first generates a video's keyframes and then inpaints the rest by generating the frames at the intervening times. We propose a fully differentiable formulation to efficiently learn this procedure. We show that KeyIn finds informative keyframes in several datasets with different dynamics and visual properties. KeyIn outperforms other recent hierarchical predictive models for planning. For more details, please see the project website at \url{https://sites.google.com/view/keyin}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes