Computer-Assisted Text Analysis for Social Science: Topic Models and Beyond
It provides a literature review to help social scientists understand and apply topic models, but it is incremental as it synthesizes existing developments.
The paper reviews the evolution of topic modeling, covering extensions for document covariates, evaluation methods, and interactive visualizations, and discusses their relevance and applications in social science research.
Topic models are a family of statistical-based algorithms to summarize, explore and index large collections of text documents. After a decade of research led by computer scientists, topic models have spread to social science as a new generation of data-driven social scientists have searched for tools to explore large collections of unstructured text. Recently, social scientists have contributed to topic model literature with developments in causal inference and tools for handling the problem of multi-modality. In this paper, I provide a literature review on the evolution of topic modeling including extensions for document covariates, methods for evaluation and interpretation, and advances in interactive visualizations along with each aspect's relevance and application for social science research.