High-dimensional Asymptotics of VAEs: Threshold of Posterior Collapse and Dataset-Size Dependence of Rate-Distortion Curve
This addresses the problem of poor representation learning in VAEs for researchers and practitioners by providing theoretical insights into posterior collapse and dataset requirements, though it is incremental as it builds on existing VAE frameworks.
The study analyzes when posterior collapse occurs in variational autoencoders (VAEs) by examining a minimal VAE in high-dimensional limits, finding that collapse becomes inevitable beyond a certain beta threshold regardless of dataset size, and that large datasets are needed for high-rate rate-distortion curves.
In variational autoencoders (VAEs), the variational posterior often collapses to the prior, known as posterior collapse, which leads to poor representation learning quality. An adjustable hyperparameter beta has been introduced in VAEs to address this issue. This study sharply evaluates the conditions under which the posterior collapse occurs with respect to beta and dataset size by analyzing a minimal VAE in a high-dimensional limit. Additionally, this setting enables the evaluation of the rate-distortion curve of the VAE. Our results show that, unlike typical regularization parameters, VAEs face "inevitable posterior collapse" beyond a certain beta threshold, regardless of dataset size. Moreover, the dataset-size dependence of the derived rate-distortion curve suggests that relatively large datasets are required to achieve a rate-distortion curve with high rates. These findings robustly explain generalization behavior observed in various real datasets with highly non-linear VAEs.