CLAIIRLGSep 15, 2018

Document Informed Neural Autoregressive Topic Models with Distributional Prior

arXiv:1809.06709v258 citations
Originality Highly original
AI Analysis

This work addresses limitations in topic modeling for both long and short texts, offering improved performance for researchers and practitioners in natural language processing.

The authors tackled the challenges of incorporating full document context and handling data sparsity in topic models by extending neural autoregressive models with context exploitation and embedding-based priors, resulting in variants that consistently outperformed state-of-the-art models across 15 datasets in generalization, interpretability, and applicability.

We address two challenges in topic models: (1) Context information around words helps in determining their actual meaning, e.g., "networks" used in the contexts "artificial neural networks" vs. "biological neuron networks". Generative topic models infer topic-word distributions, taking no or only little context into account. Here, we extend a neural autoregressive topic model to exploit the full context information around words in a document in a language modeling fashion. The proposed model is named as iDocNADE. (2) Due to the small number of word occurrences (i.e., lack of context) in short text and data sparsity in a corpus of few documents, the application of topic models is challenging on such texts. Therefore, we propose a simple and efficient way of incorporating external knowledge into neural autoregressive topic models: we use embeddings as a distributional prior. The proposed variants are named as DocNADEe and iDocNADEe. We present novel neural autoregressive topic model variants that consistently outperform state-of-the-art generative topic models in terms of generalization, interpretability (topic coherence) and applicability (retrieval and classification) over 7 long-text and 8 short-text datasets from diverse domains.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes