Where on Earth Do Users Say They Are?: Geo-Entity Linking for Noisy Multilingual User Input
This addresses the problem of linking location mentions in noisy, multilingual social media for applications like social media analysis, though it is incremental as it builds on existing embedding and confidence score techniques.
The paper tackled geo-entity linking for noisy, multilingual social media data by presenting a method using averaged embeddings from labeled user-input location names with an interpretable confidence score, resulting in improved performance on a global and multilingual dataset.
Geo-entity linking is the task of linking a location mention to the real-world geographic location. In this paper we explore the challenging task of geo-entity linking for noisy, multilingual social media data. There are few open-source multilingual geo-entity linking tools available and existing ones are often rule-based, which break easily in social media settings, or LLM-based, which are too expensive for large-scale datasets. We present a method which represents real-world locations as averaged embeddings from labeled user-input location names and allows for selective prediction via an interpretable confidence score. We show that our approach improves geo-entity linking on a global and multilingual social media dataset, and discuss progress and problems with evaluating at different geographic granularities.