Motion Graph Unleashed: A Novel Approach to Video Prediction
This addresses the problem of efficient and accurate video prediction for applications like autonomous driving and surveillance, though it appears incremental as it builds on existing motion representation methods.
The paper tackles video prediction by introducing a motion graph representation that transforms video patches into interconnected graph nodes to capture spatial-temporal relationships, achieving state-of-the-art performance on UCF Sports with a 78% reduction in model size and 47% decrease in GPU memory usage.
We introduce motion graph, a novel approach to the video prediction problem, which predicts future video frames from limited past data. The motion graph transforms patches of video frames into interconnected graph nodes, to comprehensively describe the spatial-temporal relationships among them. This representation overcomes the limitations of existing motion representations such as image differences, optical flow, and motion matrix that either fall short in capturing complex motion patterns or suffer from excessive memory consumption. We further present a video prediction pipeline empowered by motion graph, exhibiting substantial performance improvements and cost reductions. Experiments on various datasets, including UCF Sports, KITTI and Cityscapes, highlight the strong representative ability of motion graph. Especially on UCF Sports, our method matches and outperforms the SOTA methods with a significant reduction in model size by 78% and a substantial decrease in GPU memory utilization by 47%.