SSNVC: Single Stream Neural Video Compression with Implicit Temporal Information
This addresses the redundancy and training complexity in neural video compression for video encoding applications, though it appears incremental as it builds on existing NVC methods.
The paper tackles the complexity of neural video compression by proposing a single-stream framework that eliminates motion vector encoder-decoder modules, simplifying training and compression while achieving state-of-the-art performance on benchmarks.
Recently, Neural Video Compression (NVC) techniques have achieved remarkable performance, even surpassing the best traditional lossy video codec. However, most existing NVC methods heavily rely on transmitting Motion Vector (MV) to generate accurate contextual features, which has the following drawbacks. (1) Compressing and transmitting MV requires specialized MV encoder and decoder, which makes modules redundant. (2) Due to the existence of MV Encoder-Decoder, the training strategy is complex. In this paper, we present a noval Single Stream NVC framework (SSNVC), which removes complex MV Encoder-Decoder structure and uses a one-stage training strategy. SSNVC implicitly use temporal information by adding previous entropy model feature to current entropy model and using previous two frame to generate predicted motion information at the decoder side. Besides, we enhance the frame generator to generate higher quality reconstructed frame. Experiments demonstrate that SSNVC can achieve state-of-the-art performance on multiple benchmarks, and can greatly simplify compression process as well as training process.