MLLGJul 21, 2021

A variational approximate posterior for the deep Wishart process

arXiv:2107.10125v211 citations
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in Bayesian deep learning for researchers and practitioners by enabling inference in DWPs, though it is incremental as it builds on existing deep kernel processes.

The paper tackles the problem of inference in deep Wishart processes (DWPs), which was previously impossible due to inflexible distributions over positive semi-definite matrices, by developing a novel variational approximate posterior with dependency across layers and a doubly-stochastic inducing-point inference scheme, resulting in improved performance over deep Gaussian processes with equivalent priors.

Recent work introduced deep kernel processes as an entirely kernel-based alternative to NNs (Aitchison et al. 2020). Deep kernel processes flexibly learn good top-layer representations by alternately sampling the kernel from a distribution over positive semi-definite matrices and performing nonlinear transformations. A particular deep kernel process, the deep Wishart process (DWP), is of particular interest because its prior can be made equivalent to deep Gaussian process (DGP) priors for kernels that can be expressed entirely in terms of Gram matrices. However, inference in DWPs has not yet been possible due to the lack of sufficiently flexible distributions over positive semi-definite matrices. Here, we give a novel approach to obtaining flexible distributions over positive semi-definite matrices by generalising the Bartlett decomposition of the Wishart probability density. We use this new distribution to develop an approximate posterior for the DWP that includes dependency across layers. We develop a doubly-stochastic inducing-point inference scheme for the DWP and show experimentally that inference in the DWP can improve performance over doing inference in a DGP with the equivalent prior.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes