CLNov 19, 2018

A Comparative Analysis of Content-based Geolocation in Blogs and Tweets

arXiv:1811.07497v1
Originality Incremental advance
AI Analysis

This work addresses geolocation challenges for social media analysis, offering incremental improvements in feature design and cross-media performance.

The paper tackled geolocation of online content by comparing text-based methods on Blogger and Twitter data, introducing novel location-specific features that reduced error rates by up to 12.5% compared to previous features, and found Blogger users harder to geolocate despite longer posts.

The geolocation of online information is an essential component in any geospatial application. While most of the previous work on geolocation has focused on Twitter, in this paper we quantify and compare the performance of text-based geolocation methods on social media data drawn from both Blogger and Twitter. We introduce a novel set of location specific features that are both highly informative and easily interpretable, and show that we can achieve error rate reductions of up to 12.5% with respect to the best previously proposed geolocation features. We also show that despite posting longer text, Blogger users are significantly harder to geolocate than Twitter users. Additionally, we investigate the effect of training and testing on different media (cross-media predictions), or combining multiple social media sources (multi-media predictions). Finally, we explore the geolocability of social media in relation to three user dimensions: state, gender, and industry.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes