IVCVMar 25, 2025

GIViC: Generative Implicit Video Compression

arXiv:2503.19604v111 citationsh-index: 13
Originality Highly original
AI Analysis

This work addresses the need for more efficient video compression methods for applications like streaming and storage, representing a significant advancement rather than an incremental improvement.

The paper tackled the problem of video compression using implicit neural representations (INRs) by proposing GIViC, a generative implicit video compression framework, which achieved BD-rate savings of 15.94% over VVC VTM, 22.46% over DCVC-FM, and 8.52% over NVRC, making it the first INR-based codec to outperform VTM in a Random Access configuration.

While video compression based on implicit neural representations (INRs) has recently demonstrated great potential, existing INR-based video codecs still cannot achieve state-of-the-art (SOTA) performance compared to their conventional or autoencoder-based counterparts given the same coding configuration. In this context, we propose a Generative Implicit Video Compression framework, GIViC, aiming at advancing the performance limits of this type of coding methods. GIViC is inspired by the characteristics that INRs share with large language and diffusion models in exploiting long-term dependencies. Through the newly designed implicit diffusion process, GIViC performs diffusive sampling across coarse-to-fine spatiotemporal decompositions, gradually progressing from coarser-grained full-sequence diffusion to finer-grained per-token diffusion. A novel Hierarchical Gated Linear Attention-based transformer (HGLA), is also integrated into the framework, which dual-factorizes global dependency modeling along scale and sequential axes. The proposed GIViC model has been benchmarked against SOTA conventional and neural codecs using a Random Access (RA) configuration (YUV 4:2:0, GOPSize=32), and yields BD-rate savings of 15.94%, 22.46% and 8.52% over VVC VTM, DCVC-FM and NVRC, respectively. As far as we are aware, GIViC is the first INR-based video codec that outperforms VTM based on the RA coding configuration. The source code will be made available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes