CL IR LGNov 19, 2017

Prior-aware Dual Decomposition: Document-specific Topic Inference for Spectral Topic Models

arXiv:1711.07065v10.71 citations

Originality Highly original

AI Analysis

This addresses the limitation of spectral topic models in providing document-level topic information, offering a more efficient and parallelizable solution for researchers and practitioners in text analysis.

The paper tackled the problem of inferring document-specific topic compositions in spectral topic models, which lack this capability, by proposing a Prior-aware Dual Decomposition (PADD) method that leverages topic correlations as a prior. The result showed that PADD notably outperformed existing methods like TLI and achieved quality comparable to Gibbs sampling on various datasets.

Spectral topic modeling algorithms operate on matrices/tensors of word co-occurrence statistics to learn topic-specific word distributions. This approach removes the dependence on the original documents and produces substantial gains in efficiency and provable topic inference, but at a cost: the model can no longer provide information about the topic composition of individual documents. Recently Thresholded Linear Inverse (TLI) is proposed to map the observed words of each document back to its topic composition. However, its linear characteristics limit the inference quality without considering the important prior information over topics. In this paper, we evaluate Simple Probabilistic Inverse (SPI) method and novel Prior-aware Dual Decomposition (PADD) that is capable of learning document-specific topic compositions in parallel. Experiments show that PADD successfully leverages topic correlations as a prior, notably outperforming TLI and learning quality topic compositions comparable to Gibbs sampling on various data.

View on arXiv PDF

Similar