Topic Similarity Networks: Visual Analytics for Large Document Sets
This work addresses the challenge of analyzing and summarizing large unstructured document sets for researchers and analysts, though it is incremental as it builds on existing LDA methods with new visualization techniques.
The paper tackles the problem of improving interpretability of LDA topic models by introducing topic similarity networks, which visualize latent topics and their connections to explore large text collections. The result is demonstrated through case studies on NSF grants and Wikipedia, showing efficacy in revealing non-obvious document connections and thematic insights.
We investigate ways in which to improve the interpretability of LDA topic models by better analyzing and visualizing their outputs. We focus on examining what we refer to as topic similarity networks: graphs in which nodes represent latent topics in text collections and links represent similarity among topics. We describe efficient and effective approaches to both building and labeling such networks. Visualizations of topic models based on these networks are shown to be a powerful means of exploring, characterizing, and summarizing large collections of unstructured text documents. They help to "tease out" non-obvious connections among different sets of documents and provide insights into how topics form larger themes. We demonstrate the efficacy and practicality of these approaches through two case studies: 1) NSF grants for basic research spanning a 14 year period and 2) the entire English portion of Wikipedia.