LGIRMLOct 30, 2013

Necessary and Sufficient Conditions for Novel Word Detection in Separable Topic Models

arXiv:1310.7994v14 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of topic modeling in big-data scenarios by providing a more practical algorithm with theoretical guarantees, though it builds on existing conditions and is incremental in improving computational feasibility.

The paper tackled the problem of detecting novel words in separable topic models, showing that the simplicial condition is necessary for consistent estimation and presenting a practical quadratic-complexity algorithm that consistently detects all novel words using only second-order word moments.

The simplicial condition and other stronger conditions that imply it have recently played a central role in developing polynomial time algorithms with provable asymptotic consistency and sample complexity guarantees for topic estimation in separable topic models. Of these algorithms, those that rely solely on the simplicial condition are impractical while the practical ones need stronger conditions. In this paper, we demonstrate, for the first time, that the simplicial condition is a fundamental, algorithm-independent, information-theoretic necessary condition for consistent separable topic estimation. Furthermore, under solely the simplicial condition, we present a practical quadratic-complexity algorithm based on random projections which consistently detects all novel words of all topics using only up to second-order empirical word moments. This algorithm is amenable to distributed implementation making it attractive for 'big-data' scenarios involving a network of large distributed databases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes