Occam's Razor is Only as Sharp as Your ELBO
For Bayesian practitioners using variational inference for model selection, the paper highlights that reduced-rank assumptions can lead to overfitting, cautioning against naive use of ELBO for model selection.
The paper shows that ELBO-based hyperparameter learning in over-parameterized regression can overfit depending on the rank of the approximate posterior covariance, and that the marginal likelihood sometimes prefers the overfit version while the ELBO does not.
The marginal likelihood, also known as the evidence, is regarded as a mathematical embodiment of Occam's razor, enabling model selection that avoids overfitting. The evidence lower bound (ELBO) objective from variational inference has also been used for similar purposes. Prior work has shown that restricting the approximate posterior family via a mean-field approximation can lead the ELBO to underfit. In this paper, we show how ELBO-based hyperparameter learning in a simple over-parameterized regression model can also produce overfitting, depending on the assumed rank of the covariance matrix in a Gaussian approximate posterior. Surprisingly, among only the underfit and overfit options, Bayesian model selection via the evidence itself sometimes prefers the overfit version, while the ELBO does not. Bayesian practitioners hoping to scale to large models should be cautious about how reduced-rank assumptions needed for tractability may impact the potential for model selection.