CVApr 20, 2022

STAU: A SpatioTemporal-Aware Unit for Video Prediction and Beyond

arXiv:2204.09456v15 citationsh-index: 68
Originality Incremental advance
AI Analysis

This addresses video prediction and related tasks by improving spatiotemporal modeling, but it appears incremental as it builds on existing methods by integrating spatial and temporal information more effectively.

The paper tackled the problem of video prediction by proposing a SpatioTemporal-Aware Unit (STAU) to model spatiotemporal correlations, and it outperformed other methods in performance and computation efficiency across tasks including video prediction, early action recognition, and object detection.

Video prediction aims to predict future frames by modeling the complex spatiotemporal dynamics in videos. However, most of the existing methods only model the temporal information and the spatial information for videos in an independent manner but haven't fully explored the correlations between both terms. In this paper, we propose a SpatioTemporal-Aware Unit (STAU) for video prediction and beyond by exploring the significant spatiotemporal correlations in videos. On the one hand, the motion-aware attention weights are learned from the spatial states to help aggregate the temporal states in the temporal domain. On the other hand, the appearance-aware attention weights are learned from the temporal states to help aggregate the spatial states in the spatial domain. In this way, the temporal information and the spatial information can be greatly aware of each other in both domains, during which, the spatiotemporal receptive field can also be greatly broadened for more reliable spatiotemporal modeling. Experiments are not only conducted on traditional video prediction tasks but also other tasks beyond video prediction, including the early action recognition and object detection tasks. Experimental results show that our STAU can outperform other methods on all tasks in terms of performance and computation efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes