CVJan 5, 2025

MetaNeRV: Meta Neural Representations for Videos with Spatial-Temporal Guidance

arXiv:2501.02427v27 citationsh-index: 9AAAI
Originality Incremental advance
AI Analysis

This work addresses the computational bottleneck in video analysis for researchers and practitioners by reducing training time for NeRV-based methods, though it is incremental as it builds on existing NeRV approaches.

The paper tackles the inefficiency of training separate Neural Representations for Videos (NeRV) models for each video by proposing MetaNeRV, a meta-learning framework that learns an optimal parameter initialization for fast adaptation to unseen videos, achieving superior performance in video representation and compression across multiple datasets.

Neural Representations for Videos (NeRV) has emerged as a promising implicit neural representation (INR) approach for video analysis, which represents videos as neural networks with frame indexes as inputs. However, NeRV-based methods are time-consuming when adapting to a large number of diverse videos, as each video requires a separate NeRV model to be trained from scratch. In addition, NeRV-based methods spatially require generating a high-dimension signal (i.e., an entire image) from the input of a low-dimension timestamp, and a video typically consists of tens of frames temporally that have a minor change between adjacent frames. To improve the efficiency of video representation, we propose Meta Neural Representations for Videos, named MetaNeRV, a novel framework for fast NeRV representation for unseen videos. MetaNeRV leverages a meta-learning framework to learn an optimal parameter initialization, which serves as a good starting point for adapting to new videos. To address the unique spatial and temporal characteristics of video modality, we further introduce spatial-temporal guidance to improve the representation capabilities of MetaNeRV. Specifically, the spatial guidance with a multi-resolution loss aims to capture the information from different resolution stages, and the temporal guidance with an effective progressive learning strategy could gradually refine the number of fitted frames during the meta-learning process. Extensive experiments conducted on multiple datasets demonstrate the superiority of MetaNeRV for video representations and video compression.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes