CVMar 29, 2025

Real-time Video Prediction With Fast Video Interpolation Model and Prediction Training

Shota Hirose, Kazuki Kotoyori, Kasidis Arunruangsirilert, Fangzheng Lin, Heming Sun, Jiro Katto

arXiv:2503.23185v26.23 citationsh-index: 24Has CodeICIP

Originality Incremental advance

AI Analysis

This work addresses latency issues for users in real-time video interactions, but it is incremental as it builds on existing frame interpolation models.

The paper tackles the problem of transmission latency in real-time video applications by proposing a fast video prediction method, achieving the best trade-off between prediction accuracy and computational speed among existing methods.

Transmission latency significantly affects users' quality of experience in real-time interaction and actuation. As latency is principally inevitable, video prediction can be utilized to mitigate the latency and ultimately enable zero-latency transmission. However, most of the existing video prediction methods are computationally expensive and impractical for real-time applications. In this work, we therefore propose real-time video prediction towards the zero-latency interaction over networks, called IFRVP (Intermediate Feature Refinement Video Prediction). Firstly, we propose three training methods for video prediction that extend frame interpolation models, where we utilize a simple convolution-only frame interpolation network based on IFRNet. Secondly, we introduce ELAN-based residual blocks into the prediction models to improve both inference speed and accuracy. Our evaluations show that our proposed models perform efficiently and achieve the best trade-off between prediction accuracy and computational speed among the existing video prediction methods. A demonstration movie is also provided at http://bit.ly/IFRVPDemo. The code will be released at https://github.com/FykAikawa/IFRVP.

View on arXiv PDF Code

Similar