CLAIJul 22, 2024

Leveraging Large Language Models to Geolocate Linguistic Variations in Social Media Posts

arXiv:2407.16047v12 citationsh-index: 1Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of geolocating social media content for applications like disaster response or marketing, but it is incremental as it applies existing LLM methods to a specific dataset.

The paper tackled geolocalizing Italian tweets by fine-tuning large language models to predict both region and coordinates, achieving state-of-the-art results in the GeoLingIt challenge.

Geolocalization of social media content is the task of determining the geographical location of a user based on textual data, that may show linguistic variations and informal language. In this project, we address the GeoLingIt challenge of geolocalizing tweets written in Italian by leveraging large language models (LLMs). GeoLingIt requires the prediction of both the region and the precise coordinates of the tweet. Our approach involves fine-tuning pre-trained LLMs to simultaneously predict these geolocalization aspects. By integrating innovative methodologies, we enhance the models' ability to understand the nuances of Italian social media text to improve the state-of-the-art in this domain. This work is conducted as part of the Large Language Models course at the Bertinoro International Spring School 2024. We make our code publicly available on GitHub https://github.com/dawoz/geolingit-biss2024.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes