CLAIIRLGJul 2, 2023

HeGeL: A Novel Dataset for Geo-Location from Hebrew Text

arXiv:2307.00509v1225 citationsh-index: 30Has Code
Originality Synthesis-oriented
AI Analysis

This addresses the problem of geolocation from text for Hebrew speakers and researchers, but it is incremental as it extends existing work to a new language.

The authors tackled the lack of textual geolocation datasets for Hebrew, a resource-poor language, by creating the HeGeL corpus with 5,649 literal place descriptions from three Israeli cities, showing it requires novel environmental representation.

The task of textual geolocation - retrieving the coordinates of a place based on a free-form language description - calls for not only grounding but also natural language understanding and geospatial reasoning. Even though there are quite a few datasets in English used for geolocation, they are currently based on open-source data (Wikipedia and Twitter), where the location of the described place is mostly implicit, such that the location retrieval resolution is limited. Furthermore, there are no datasets available for addressing the problem of textual geolocation in morphologically rich and resource-poor languages, such as Hebrew. In this paper, we present the Hebrew Geo-Location (HeGeL) corpus, designed to collect literal place descriptions and analyze lingual geospatial reasoning. We crowdsourced 5,649 literal Hebrew place descriptions of various place types in three cities in Israel. Qualitative and empirical analysis show that the data exhibits abundant use of geospatial reasoning and requires a novel environmental representation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes