Top-down inference in an early visual cortex inspired hierarchical Variational Autoencoder
This work addresses the problem of understanding hierarchical visual processing for neuroscience and machine learning, though it appears incremental as it builds on existing VAE frameworks with specific architectural variations.
The authors tackled the challenge of modeling hierarchical computations in the early visual cortex using hierarchical Variational Autoencoders (VAEs) trained on natural images, showing that representations similar to primary and secondary visual cortices emerge with mild inductive biases, and that a top-down processing component is critical for learning higher-order moments and image inpainting.
Interpreting computations in the visual cortex as learning and inference in a generative model of the environment has received wide support both in neuroscience and cognitive science. However, hierarchical computations, a hallmark of visual cortical processing, has remained impervious for generative models because of a lack of adequate tools to address it. Here we capitalize on advances in Variational Autoencoders (VAEs) to investigate the early visual cortex with sparse coding hierarchical VAEs trained on natural images. We design alternative architectures that vary both in terms of the generative and the recognition components of the two latent-layer VAE. We show that representations similar to the one found in the primary and secondary visual cortices naturally emerge under mild inductive biases. Importantly, a nonlinear representation for texture-like patterns is a stable property of the high-level latent space resistant to the specific architecture of the VAE, reminiscent of the secondary visual cortex. We show that a neuroscience-inspired choice of the recognition model, which features a top-down processing component is critical for two signatures of computations with generative models: learning higher order moments of the posterior beyond the mean and image inpainting. Patterns in higher order response statistics provide inspirations for neuroscience to interpret response correlations and for machine learning to evaluate the learned representations through more detailed characterization of the posterior.