Rethinking LDA: moment matching for discrete ICA
This work addresses estimation challenges in topic modeling for researchers and practitioners, presenting an incremental improvement over prior moment matching methods.
The paper tackled the problem of estimation in Latent Dirichlet Allocation (LDA) by linking it to discrete independent component analysis (ICA), deriving new cumulant-based tensors with improved sample complexity and using joint diagonalization techniques to outperform existing moment matching methods in experiments on synthetic and real datasets.
We consider moment matching techniques for estimation in Latent Dirichlet Allocation (LDA). By drawing explicit links between LDA and discrete versions of independent component analysis (ICA), we first derive a new set of cumulant-based tensors, with an improved sample complexity. Moreover, we reuse standard ICA techniques such as joint diagonalization of tensors to improve over existing methods based on the tensor power method. In an extensive set of experiments on both synthetic and real datasets, we show that our new combination of tensors and orthogonal joint diagonalization techniques outperforms existing moment matching methods.