Multivariate Variational Autoencoder
This work addresses the problem of modeling complex latent correlations in VAEs for researchers and practitioners in machine learning, offering an incremental improvement over existing VAE methods.
The paper tackles the limitation of diagonal posterior covariance in Variational Autoencoders (VAEs) by introducing the Multivariate Variational Autoencoder (MVAE), which uses a global coupling matrix and per-sample diagonal scales to enable full-covariance modeling while maintaining tractability, resulting in improved reconstruction, calibration, and unsupervised structure across multiple datasets like MNIST variants, Fashion-MNIST, CIFAR-10, and CIFAR-100.
We present the Multivariate Variational Autoencoder (MVAE), a VAE variant that preserves Gaussian tractability while lifting the diagonal posterior restriction. MVAE factorizes each posterior covariance, where a \emph{global} coupling matrix $\mathbf{C}$ induces dataset-wide latent correlations and \emph{per-sample} diagonal scales modulate local uncertainty. This yields a full-covariance family with analytic KL and an efficient reparameterization via $\mathbf{L}=\mathbf{C}\mathrm{diag}(\boldsymbolσ)$. Across Larochelle-style MNIST variants, Fashion-MNIST, CIFAR-10, and CIFAR-100, MVAE consistently matches or improves reconstruction (MSE~$\downarrow$) and delivers robust gains in calibration (NLL/Brier/ECE~$\downarrow$) and unsupervised structure (NMI/ARI~$\uparrow$) relative to diagonal-covariance VAEs with matched capacity, especially at mid-range latent sizes. Latent-plane visualizations further indicate smoother, more coherent factor traversals and sharper local detail. We release a fully reproducible implementation with training/evaluation scripts and sweep utilities to facilitate fair comparison and reuse.