An Equivalence between Bayesian Priors and Penalties in Variational Inference
This work addresses a foundational issue in machine learning for researchers and practitioners using variational inference, clarifying the relationship between regularization and Bayesian priors.
The paper tackles the problem of characterizing which regularization penalties in variational inference correspond to valid Bayesian priors, providing a systematic method to compute the prior for a given penalty to ensure the procedure remains Bayesian.
In machine learning, it is common to optimize the parameters of a probabilistic model, modulated by an ad hoc regularization term that penalizes some values of the parameters. Regularization terms appear naturally in Variational Inference, a tractable way to approximate Bayesian posteriors: the loss to optimize contains a Kullback--Leibler divergence term between the approximate posterior and a Bayesian prior. We fully characterize the regularizers that can arise according to this procedure, and provide a systematic way to compute the prior corresponding to a given penalty. Such a characterization can be used to discover constraints over the penalty function, so that the overall procedure remains Bayesian.