Distilling the Knowledge from Conditional Normalizing Flows
This work addresses efficiency issues in generative modeling for domains like vision and speech, but it is incremental as it builds on existing flow-based models.
The paper tackles the problem of normalizing flows being overparameterized and slow for inference by proposing a distillation method to compress them into more efficient models, achieving comparable performance on image super-resolution and speech synthesis tasks.
Normalizing flows are a powerful class of generative models demonstrating strong performance in several speech and vision problems. In contrast to other generative models, normalizing flows are latent variable models with tractable likelihoods and allow for stable training. However, they have to be carefully designed to represent invertible functions with efficient Jacobian determinant calculation. In practice, these requirements lead to overparameterized and sophisticated architectures that are inferior to alternative feed-forward models in terms of inference time and memory consumption. In this work, we investigate whether one can distill flow-based models into more efficient alternatives. We provide a positive answer to this question by proposing a simple distillation approach and demonstrating its effectiveness on state-of-the-art conditional flow-based models for image super-resolution and speech synthesis.