ML CL LGOct 16, 2014

Graph-Sparse LDA: A Topic Model with Structured Sparsity

Finale Doshi-Velez, Byron Wallace, Ryan Adams

arXiv:1410.4510v257 citations

Originality Incremental advance

AI Analysis

This work addresses interpretability issues in topic modeling for domains like biomedicine, representing an incremental improvement by incorporating structured sparsity.

The authors tackled the problem of poor interpretability in topic models by introducing Graph-Sparse LDA, which uses word relationships from an ontology to produce sparse, interpretable topics. The model achieved state-of-the-art prediction performance on two biomedical datasets.

Originally designed to model text, topic modeling has become a powerful tool for uncovering latent structure in domains including medicine, finance, and vision. The goals for the model vary depending on the application: in some cases, the discovered topics may be used for prediction or some other downstream task. In other cases, the content of the topic itself may be of intrinsic scientific interest. Unfortunately, even using modern sparse techniques, the discovered topics are often difficult to interpret due to the high dimensionality of the underlying space. To improve topic interpretability, we introduce Graph-Sparse LDA, a hierarchical topic model that leverages knowledge of relationships between words (e.g., as encoded by an ontology). In our model, topics are summarized by a few latent concept-words from the underlying graph that explain the observed words. Graph-Sparse LDA recovers sparse, interpretable summaries on two real-world biomedical datasets while matching state-of-the-art prediction performance.

View on arXiv PDF

Similar