CVApr 2, 2021

Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning

arXiv:2104.00924v1133 citations
Originality Incremental advance
AI Analysis

This work addresses video prediction challenges for applications like autonomous systems or surveillance, but it appears incremental as it builds on existing RNN-based methods with specific enhancements.

The paper tackles the problem of predicting future video frames by recalling long-term motion contexts, such as distinguishing walking from running, and proposes a method using memory alignment learning and query decomposition to handle limited dynamics and high-dimensional motion. Experimental results show it outperforms other RNN-based methods, particularly in long-term conditions.

Our work addresses long-term motion context issues for predicting future frames. To predict the future precisely, it is required to capture which long-term motion context (e.g., walking or running) the input motion (e.g., leg movement) belongs to. The bottlenecks arising when dealing with the long-term motion context are: (i) how to predict the long-term motion context naturally matching input sequences with limited dynamics, (ii) how to predict the long-term motion context with high-dimensionality (e.g., complex motion). To address the issues, we propose novel motion context-aware video prediction. To solve the bottleneck (i), we introduce a long-term motion context memory (LMC-Memory) with memory alignment learning. The proposed memory alignment learning enables to store long-term motion contexts into the memory and to match them with sequences including limited dynamics. As a result, the long-term context can be recalled from the limited input sequence. In addition, to resolve the bottleneck (ii), we propose memory query decomposition to store local motion context (i.e., low-dimensional dynamics) and recall the suitable local context for each local part of the input individually. It enables to boost the alignment effects of the memory. Experimental results show that the proposed method outperforms other sophisticated RNN-based methods, especially in long-term condition. Further, we validate the effectiveness of the proposed network designs by conducting ablation studies and memory feature analysis. The source code of this work is available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes