IV CV MLJul 9, 2022

Video Coding Using Learned Latent GAN Compression

Mustafa Shukor, Bharath Bhushan Damodaran, Xu Yao, Pierre Hellier

arXiv:2207.04324v29.510 citationsh-index: 30

Originality Highly original

AI Analysis

This addresses video compression for facial content, offering a novel approach that improves efficiency and quality, though it is incremental in applying existing generative models to a specific domain.

The paper tackles facial video compression by using GANs and normalizing flows to learn latent representations, achieving better results than state-of-the-art codecs like VTM and AV1, with significant perceptual distortion reduction at low bit rates.

We propose in this paper a new paradigm for facial video compression. We leverage the generative capacity of GANs such as StyleGAN to represent and compress a video, including intra and inter compression. Each frame is inverted in the latent space of StyleGAN, from which the optimal compression is learned. To do so, a diffeomorphic latent representation is learned using a normalizing flows model, where an entropy model can be optimized for image coding. In addition, we propose a new perceptual loss that is more efficient than other counterparts. Finally, an entropy model for video inter coding with residual is also learned in the previously constructed latent representation. Our method (SGANC) is simple, faster to train, and achieves better results for image and video coding compared to state-of-the-art codecs such as VTM, AV1, and recent deep learning techniques. In particular, it drastically minimizes perceptual distortion at low bit rates.

View on arXiv PDF

Similar