CLIRLGJun 1, 2016

On a Topic Model for Sentences

arXiv:1606.00253v148 citations
Originality Incremental advance
AI Analysis

This work addresses a limitation in topic modeling for text analysis, though it appears incremental as an extension of LDA.

The paper tackles the problem of standard topic models losing sentence-level structure by proposing sentenceLDA, an extension that incorporates text structure into generative and inference processes, and shows advantages over LDA in perplexity and text classification tasks on various collections.

Probabilistic topic models are generative models that describe the content of documents by discovering the latent topics underlying them. However, the structure of the textual input, and for instance the grouping of words in coherent text spans such as sentences, contains much information which is generally lost with these models. In this paper, we propose sentenceLDA, an extension of LDA whose goal is to overcome this limitation by incorporating the structure of the text in the generative and inference processes. We illustrate the advantages of sentenceLDA by comparing it with LDA using both intrinsic (perplexity) and extrinsic (text classification) evaluation tasks on different text collections.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes