A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs
This work addresses theoretical gaps in U-Net regularization for researchers in deep learning, offering incremental insights with practical efficiency gains.
The paper tackles the understudied regularization properties and wavelet relationships of U-Net architectures by formulating a multi-resolution framework that identifies U-Nets as truncations of infinite-dimensional models, proving average pooling corresponds to projection and learns a Haar wavelet basis. It leverages this to analyze hierarchical VAEs, showing they discretize diffusion processes with instabilities, and achieves state-of-the-art performance with half the parameters through improved weight-sharing.
U-Net architectures are ubiquitous in state-of-the-art deep learning, however their regularisation properties and relationship to wavelets are understudied. In this paper, we formulate a multi-resolution framework which identifies U-Nets as finite-dimensional truncations of models on an infinite-dimensional function space. We provide theoretical results which prove that average pooling corresponds to projection within the space of square-integrable functions and show that U-Nets with average pooling implicitly learn a Haar wavelet basis representation of the data. We then leverage our framework to identify state-of-the-art hierarchical VAEs (HVAEs), which have a U-Net architecture, as a type of two-step forward Euler discretisation of multi-resolution diffusion processes which flow from a point mass, introducing sampling instabilities. We also demonstrate that HVAEs learn a representation of time which allows for improved parameter efficiency through weight-sharing. We use this observation to achieve state-of-the-art HVAE performance with half the number of parameters of existing models, exploiting the properties of our continuous-time formulation.