CL MMMay 4, 2021

GraphTMT: Unsupervised Graph-based Topic Modeling from Video Transcripts

Lukas Stappen, Jason Thies, Gerhard Hagerer, Björn W. Schuller, Georg Groh

arXiv:2105.01466v40.72 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This addresses the problem of extracting topics from video data for social media analysis, but it is incremental as it adapts existing graph-based clustering to a new domain.

The paper tackled topic modeling from video transcripts by proposing GraphTMT, an unsupervised graph-based method that does not require knowing the true number of topics, and it outperformed baseline methods on the MuSe-CaR dataset and was applied to the Citysearch corpus.

To unfold the tremendous amount of multimedia data uploaded daily to social media platforms, effective topic modeling techniques are needed. Existing work tends to apply topic models on written text datasets. In this paper, we propose a topic extractor on video transcripts. Exploiting neural word embeddings through graph-based clustering, we aim to improve usability and semantic coherence. Unlike most topic models, this approach works without knowing the true number of topics, which is important when no such assumption can or should be made. Experimental results on the real-life multimodal dataset MuSe-CaR demonstrates that our approach GraphTMT extracts coherent and meaningful topics and outperforms baseline methods. Furthermore, we successfully demonstrate the applicability of our approach on the popular Citysearch corpus.

View on arXiv PDF Code

Similar