AI LGAug 31, 2021

Quantization of Generative Adversarial Networks for Efficient Inference: a Methodological Study

Pavel Andreev, Alexander Fritzler, Dmitry Vetrov

arXiv:2108.13996v16.115 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of high energy consumption and computational demands for GANs in digital content creation, enabling more efficient inference on resource-constrained devices, though it is incremental as it applies existing quantization techniques to GANs.

The study tackled the challenge of deploying generative adversarial networks (GANs) on edge devices by applying quantization to reduce computational costs, achieving successful quantization with 4/8-bit weights and 8-bit activations while preserving model quality.

Generative adversarial networks (GANs) have an enormous potential impact on digital content creation, e.g., photo-realistic digital avatars, semantic content editing, and quality enhancement of speech and images. However, the performance of modern GANs comes together with massive amounts of computations performed during the inference and high energy consumption. That complicates, or even makes impossible, their deployment on edge devices. The problem can be reduced with quantization -- a neural network compression technique that facilitates hardware-friendly inference by replacing floating-point computations with low-bit integer ones. While quantization is well established for discriminative models, the performance of modern quantization techniques in application to GANs remains unclear. GANs generate content of a more complex structure than discriminative models, and thus quantization of GANs is significantly more challenging. To tackle this problem, we perform an extensive experimental study of state-of-art quantization techniques on three diverse GAN architectures, namely StyleGAN, Self-Attention GAN, and CycleGAN. As a result, we discovered practical recipes that allowed us to successfully quantize these models for inference with 4/8-bit weights and 8-bit activations while preserving the quality of the original full-precision models.

View on arXiv PDF

Similar