Modeling Online Discourse with Coupled Distributed Topics
This work addresses the challenge of modeling socially generated corpora, such as online forums, for researchers in natural language processing and social media analysis, but it is incremental as it builds on existing topic modeling approaches.
The paper tackled modeling online discourse by proposing a deep topic model that incorporates structural relationships and distributed representations, achieving efficient scaling to large data like a 13M-comment Reddit dataset and evaluating against existing methods with metrics such as perplexity.
In this paper, we propose a deep, globally normalized topic model that incorporates structural relationships connecting documents in socially generated corpora, such as online forums. Our model (1) captures discursive interactions along observed reply links in addition to traditional topic information, and (2) incorporates latent distributed representations arranged in a deep architecture, which enables a GPU-based mean-field inference procedure that scales efficiently to large data. We apply our model to a new social media dataset consisting of 13M comments mined from the popular internet forum Reddit, a domain that poses significant challenges to models that do not account for relationships connecting user comments. We evaluate against existing methods across multiple metrics including perplexity and metadata prediction, and qualitatively analyze the learned interaction patterns.