Posterior Approximation using Stochastic Gradient Ascent with Adaptive Stepsize
This work provides a scalable solution for posterior approximation in Bayesian nonparametrics, addressing a bottleneck for researchers and practitioners dealing with large datasets, though it is incremental as it builds on existing stochastic optimization techniques.
The authors tackled the problem of scaling Bayesian nonparametrics like Dirichlet process mixtures to large datasets by proposing a stochastic gradient ascent method with adaptive stepsize, achieving comparable performance to closed-form methods without sacrificing speed.
Scalable algorithms of posterior approximation allow Bayesian nonparametrics such as Dirichlet process mixture to scale up to larger dataset at fractional cost. Recent algorithms, notably the stochastic variational inference performs local learning from minibatch. The main problem with stochastic variational inference is that it relies on closed form solution. Stochastic gradient ascent is a modern approach to machine learning and is widely deployed in the training of deep neural networks. In this work, we explore using stochastic gradient ascent as a fast algorithm for the posterior approximation of Dirichlet process mixture. However, stochastic gradient ascent alone is not optimal for learning. In order to achieve both speed and performance, we turn our focus to stepsize optimization in stochastic gradient ascent. As as intermediate approach, we first optimize stepsize using the momentum method. Finally, we introduce Fisher information to allow adaptive stepsize in our posterior approximation. In the experiments, we justify that our approach using stochastic gradient ascent do not sacrifice performance for speed when compared to closed form coordinate ascent learning on these datasets. Lastly, our approach is also compatible with deep ConvNet features as well as scalable to large class datasets such as Caltech256 and SUN397.