CL AI IRDec 29, 2024

Comparative Performance of Advanced NLP Models and LLMs in Multilingual Geo-Entity Detection

arXiv:2412.20414v110 citationsh-index: 2AICCONF

Originality Synthesis-oriented

AI Analysis

It addresses the problem of precise geo-entity detection across languages for applications in global security, but is incremental as it compares existing models on new data.

This paper evaluated the performance of advanced NLP models and LLMs, including SpaCy, XLM-RoBERTa, mLUKE, GeoLM, GPT 3.5, and GPT 4, in detecting geo-entities from multilingual Telegram datasets in English, Russian, and Arabic, using metrics like accuracy, precision, recall, and F1 scores to identify their strengths and weaknesses.

The integration of advanced Natural Language Processing (NLP) methodologies and Large Language Models (LLMs) has significantly enhanced the extraction and analysis of geospatial data from multilingual texts, impacting sectors such as national and international security. This paper presents a comprehensive evaluation of leading NLP models -- SpaCy, XLM-RoBERTa, mLUKE, GeoLM -- and LLMs, specifically OpenAI's GPT 3.5 and GPT 4, within the context of multilingual geo-entity detection. Utilizing datasets from Telegram channels in English, Russian, and Arabic, we examine the performance of these models through metrics such as accuracy, precision, recall, and F1 scores, to assess their effectiveness in accurately identifying geospatial references. The analysis exposes each model's distinct advantages and challenges, underscoring the complexities involved in achieving precise geo-entity identification across varied linguistic landscapes. The conclusions drawn from this experiment aim to direct the enhancement and creation of more advanced and inclusive NLP tools, thus advancing the field of geospatial analysis and its application to global security.

View on arXiv PDF

Similar