Reparameterized Variational Rejection Sampling
This work addresses the challenge of flexible and accurate variational inference for models with continuous latent variables, particularly benefiting practitioners in machine learning and statistics, but it is incremental as it builds upon existing VRS with a gradient estimator improvement.
The paper tackles the problem of poor posterior approximations in variational inference by revisiting Variational Rejection Sampling (VRS) and introducing a low-variance reparameterized gradient estimator, resulting in a method called Reparameterized Variational Rejection Sampling (RVRS) that offers a trade-off between computational cost and inference fidelity, with empirical demonstrations of good performance in practice.
Traditional approaches to variational inference rely on parametric families of variational distributions, with the choice of family playing a critical role in determining the accuracy of the resulting posterior approximation. Simple mean-field families often lead to poor approximations, while rich families of distributions like normalizing flows can be difficult to optimize and usually do not incorporate the known structure of the target distribution due to their black-box nature. To expand the space of flexible variational families, we revisit Variational Rejection Sampling (VRS) [Grover et al., 2018], which combines a parametric proposal distribution with rejection sampling to define a rich non-parametric family of distributions that explicitly utilizes the known target distribution. By introducing a low-variance reparameterized gradient estimator for the parameters of the proposal distribution, we make VRS an attractive inference strategy for models with continuous latent variables. We argue theoretically and demonstrate empirically that the resulting method--Reparameterized Variational Rejection Sampling (RVRS)--offers an attractive trade-off between computational cost and inference fidelity. In experiments we show that our method performs well in practice and that it is well-suited for black-box inference, especially for models with local latent variables.