IRJul 20, 2016

A Local-Global LDA Model for Discovering Geographical Topics from Social Media

arXiv:1607.05806v14.85 citationsh-index: 24

Originality Incremental advance

AI Analysis

This work addresses the problem of extracting event-related topics from geo-tagged social media for applications like regional analysis, but it is incremental as it builds on existing LDA methods.

The paper tackles the challenge of discovering geographical topics from noisy social media data by proposing a local-global LDA model that filters irrelevant words based on context weights. It demonstrates improved performance over baseline methods in metrics like perplexity and KL-divergence using Weibo data.

Micro-blogging services can track users' geo-locations when users check-in their places or use geo-tagging which implicitly reveals locations. This "geo tracking" can help to find topics triggered by some events in certain regions. However, discovering such topics is very challenging because of the large amount of noisy messages (e.g. daily conversations). This paper proposes a method to model geographical topics, which can filter out irrelevant words by different weights in the local and global contexts. Our method is based on the Latent Dirichlet Allocation (LDA) model but each word is generated from either a local or a global topic distribution by its generation probabilities. We evaluated our model with data collected from Weibo, which is currently the most popular micro-blogging service for Chinese. The evaluation results demonstrate that our method outperforms other baseline methods in several metrics such as model perplexity, two kinds of entropies and KL-divergence of discovered topics.

View on arXiv PDF

Similar