STLGMLAug 9, 2019

Convergence Rates of Variational Inference in Sparse Deep Learning

arXiv:1908.04847v20.0044 citations
AI Analysis45

This provides theoretical justification for variational inference in sparse deep learning, addressing the problem of efficient Bayesian approximation for researchers and practitioners in machine learning, though it is incremental as it builds on existing convergence theories.

The paper demonstrates that variational inference in sparse deep learning achieves near-minimax convergence rates for Hölder smooth functions, matching the generalization properties of exact Bayesian inference, and shows that model selection via ELBO maximization avoids overfitting and adaptively attains optimal rates.

Variational inference is becoming more and more popular for approximating intractable posterior distributions in Bayesian statistics and machine learning. Meanwhile, a few recent works have provided theoretical justification and new insights on deep neural networks for estimating smooth functions in usual settings such as nonparametric regression. In this paper, we show that variational inference for sparse deep learning retains the same generalization properties than exact Bayesian inference. In particular, we highlight the connection between estimation and approximation theories via the classical bias-variance trade-off and show that it leads to near-minimax rates of convergence for Hölder smooth functions. Additionally, we show that the model selection framework over the neural network architecture via ELBO maximization does not overfit and adaptively achieves the optimal rate of convergence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes