Hierarchical Autoregressive Modeling for Neural Video Compression
This work addresses video compression efficiency for multimedia applications, representing an incremental improvement through model reinterpretation.
The authors tackled video compression by connecting autoregressive generative models to neural compression methods, achieving improved rate-distortion performance over state-of-the-art neural and conventional approaches.
Recent work by Marino et al. (2020) showed improved performance in sequential density estimation by combining masked autoregressive flows with hierarchical latent variable models. We draw a connection between such autoregressive generative models and the task of lossy video compression. Specifically, we view recent neural video compression methods (Lu et al., 2019; Yang et al., 2020b; Agustssonet al., 2020) as instances of a generalized stochastic temporal autoregressive transform, and propose avenues for enhancement based on this insight. Comprehensive evaluations on large-scale video data show improved rate-distortion performance over both state-of-the-art neural and conventional video compression methods.