On Kernel-based Variational Autoencoder
This work addresses the problem of noisy and blurry image generation in VAEs for researchers and practitioners in machine learning, offering an incremental improvement by introducing a kernel-based method.
The paper tackles the limitations of Gaussian latent spaces in Variational Autoencoders by bridging VAEs with kernel density estimations, specifically using the Epanechnikov kernel to optimize posteriors and reduce noise in generated images, resulting in improved FID scores and sharpness on benchmark datasets like MNIST and CelebA.
In this paper, we bridge Variational Autoencoders (VAEs) and kernel density estimations (KDEs) by approximating the posterior by KDEs and deriving an upper bound of the Kullback-Leibler (KL) divergence in the evidence lower bound (ELBO). The flexibility of KDEs makes the optimization of posteriors in VAEs possible, which not only addresses the limitations of Gaussian latent space in vanilla VAE but also provides a new perspective of estimating the KL-divergence in ELBO. Under appropriate conditions, we show that the Epanechnikov kernel is the optimal choice in minimizing the derived upper bound of KL-divergence asymptotically. Compared with Gaussian kernel, Epanechnikov kernel has compact support which should make the generated sample less noisy and blurry. The implementation of Epanechnikov kernel in ELBO is straightforward as it lies in the "location-scale" family of distributions where the reparametrization tricks can be directly employed. A series of experiments on benchmark datasets such as MNIST, Fashion-MNIST, CIFAR-10 and CelebA further demonstrate the superiority of Epanechnikov Variational Autoenocoder (EVAE) over vanilla VAE in the quality of reconstructed images, as measured by the FID score and Sharpness.