MLAILGSep 23, 2016

Fast Learning of Clusters and Topics via Sparse Posteriors

arXiv:1609.07521v17 citations
Originality Incremental advance
AI Analysis

This work addresses a computational bottleneck for researchers and practitioners using mixture and topic models, offering a scalable solution with improved efficiency and performance, though it is incremental as it builds on existing sparse approximation methods.

The paper tackles the computational inefficiency of standard variational posteriors in mixture and topic models, which require dense storage and runtime scaling with all clusters, by proposing a sparse variational distribution that limits non-zero entries to at most L. Experiments on image patches and news articles show this approach produces higher-quality models in significantly less time than baselines.

Mixture models and topic models generate each observation from a single cluster, but standard variational posteriors for each observation assign positive probability to all possible clusters. This requires dense storage and runtime costs that scale with the total number of clusters, even though typically only a few clusters have significant posterior mass for any data point. We propose a constrained family of sparse variational distributions that allow at most $L$ non-zero entries, where the tunable threshold $L$ trades off speed for accuracy. Previous sparse approximations have used hard assignments ($L=1$), but we find that moderate values of $L>1$ provide superior performance. Our approach easily integrates with stochastic or incremental optimization algorithms to scale to millions of examples. Experiments training mixture models of image patches and topic models for news articles show that our approach produces better-quality models in far less time than baseline methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes