Extreme Stochastic Variational Inference: Distributed and Asynchronous
This addresses the problem of scaling variational inference to massive datasets with billions of parameters for researchers and practitioners in machine learning, offering a novel distributed approach.
The paper tackles the scalability limitations of stochastic variational inference (SVI) by proposing extreme stochastic variational inference (ESVI), an asynchronous and lock-free algorithm that provides data and model parallelism, and demonstrates its effectiveness by running Latent Dirichlet Allocation on a dataset with 3 million vocabulary and 3 billion tokens, outperforming VI and SVI in wallclock-time and achieving better solution quality.
Stochastic variational inference (SVI), the state-of-the-art algorithm for scaling variational inference to large-datasets, is inherently serial. Moreover, it requires the parameters to fit in the memory of a single processor; this is problematic when the number of parameters is in billions. In this paper, we propose extreme stochastic variational inference (ESVI), an asynchronous and lock-free algorithm to perform variational inference for mixture models on massive real world datasets. ESVI overcomes the limitations of SVI by requiring that each processor only access a subset of the data and a subset of the parameters, thus providing data and model parallelism simultaneously. We demonstrate the effectiveness of ESVI by running Latent Dirichlet Allocation (LDA) on UMBC-3B, a dataset that has a vocabulary of 3 million and a token size of 3 billion. In our experiments, we found that ESVI not only outperforms VI and SVI in wallclock-time, but also achieves a better quality solution. In addition, we propose a strategy to speed up computation and save memory when fitting large number of topics.