CLJan 20, 2016

Hierarchical Latent Word Clustering

arXiv:1601.05472v1
Originality Synthesis-oriented
AI Analysis

This work addresses the need for hierarchical word clustering in text analysis, but it appears incremental as it extends an existing method without clear broad impact.

The paper tackled the problem of extracting tree-structured word clusters from text data by proposing a new Bayesian non-parametric model based on Hierarchical Dirichlet Allocation, and it demonstrated meaningful hierarchical structures on datasets like the NIPS corpus and radiology reports.

This paper presents a new Bayesian non-parametric model by extending the usage of Hierarchical Dirichlet Allocation to extract tree structured word clusters from text data. The inference algorithm of the model collects words in a cluster if they share similar distribution over documents. In our experiments, we observed meaningful hierarchical structures on NIPS corpus and radiology reports collected from public repositories.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes