The Dynamic Embedded Topic Model
This work addresses the need for more efficient and effective topic modeling in sequential text data, such as academic abstracts or debates, with incremental improvements over existing methods.
The paper tackles the problem of modeling how topics in sequential documents evolve over time by developing the dynamic embedded topic model (D-ETM), which combines dynamic latent Dirichlet allocation with word embeddings, resulting in improved performance on document completion tasks and more diverse, coherent topics compared to D-LDA.
Topic modeling analyzes documents to learn meaningful patterns of words. For documents collected in sequence, dynamic topic models capture how these patterns vary over time. We develop the dynamic embedded topic model (D-ETM), a generative model of documents that combines dynamic latent Dirichlet allocation (D-LDA) and word embeddings. The D-ETM models each word with a categorical distribution parameterized by the inner product between the word embedding and a per-time-step embedding representation of its assigned topic. The D-ETM learns smooth topic trajectories by defining a random walk prior over the embedding representations of the topics. We fit the D-ETM using structured amortized variational inference with a recurrent neural network. On three different corpora---a collection of United Nations debates, a set of ACL abstracts, and a dataset of Science Magazine articles---we found that the D-ETM outperforms D-LDA on a document completion task. We further found that the D-ETM learns more diverse and coherent topics than D-LDA while requiring significantly less time to fit.