Impact of Bottleneck Layers and Skip Connections on the Generalization of Linear Denoising Autoencoders
This work addresses the generalization problem in unsupervised learning for researchers, providing theoretical insights into architectural choices in denoising autoencoders, though it is incremental as it extends prior linear model analyses.
The paper tackles the generalization of two-layer linear denoising autoencoders by analyzing the effects of bottleneck layers and skip connections, deriving closed-form test risk formulas and revealing that bottlenecks introduce a bias-variance trade-off while skip connections reduce variance in overparameterized regimes.
Modern deep neural networks exhibit strong generalization even in highly overparameterized regimes. Significant progress has been made to understand this phenomenon in the context of supervised learning, but for unsupervised tasks such as denoising, several open questions remain. While some recent works have successfully characterized the test error of the linear denoising problem, they are limited to linear models (one-layer network). In this work, we focus on two-layer linear denoising autoencoders trained under gradient flow, incorporating two key ingredients of modern deep learning architectures: A low-dimensional bottleneck layer that effectively enforces a rank constraint on the learned solution, as well as the possibility of a skip connection that bypasses the bottleneck. We derive closed-form expressions for all critical points of this model under product regularization, and in particular describe its global minimizer under the minimum-norm principle. From there, we derive the test risk formula in the overparameterized regime, both for models with and without skip connections. Our analysis reveals two interesting phenomena: Firstly, the bottleneck layer introduces an additional complexity measure akin to the classical bias-variance trade-off -- increasing the bottleneck width reduces bias but introduces variance, and vice versa. Secondly, skip connection can mitigate the variance in denoising autoencoders -- especially when the model is mildly overparameterized. We further analyze the impact of skip connections in denoising autoencoder using random matrix theory and support our claims with numerical evidence.