CLDLIRSep 17, 2020

A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching

arXiv:2009.08114v25 citations
AI Analysis

This work addresses the challenge of geographical candidate selection for researchers and practitioners dealing with noisy or non-standard text, but it is incremental as it applies existing neural network architectures to this specific domain.

The paper tackles the problem of candidate selection for toponym matching, which is crucial for entity resolution in noisy text, by introducing a deep learning method and evaluating it on new realistic datasets including cross-lingual variations and OCR errors, achieving performance improvements in toponym resolution tasks.

Recognizing toponyms and resolving them to their real-world referents is required for providing advanced semantic access to textual data. This process is often hindered by the high degree of variation in toponyms. Candidate selection is the task of identifying the potential entities that can be referred to by a toponym previously recognized. While it has traditionally received little attention in the research community, it has been shown that candidate selection has a significant impact on downstream tasks (i.e. entity resolution), especially in noisy or non-standard text. In this paper, we introduce a flexible deep learning method for candidate selection through toponym matching, using state-of-the-art neural network architectures. We perform an intrinsic toponym matching evaluation based on several new realistic datasets, which cover various challenging scenarios (cross-lingual and regional variations, as well as OCR errors). We report its performance on candidate selection in the context of the downstream task of toponym resolution, both on existing datasets and on a new manually-annotated resource of nineteenth-century English OCR'd text.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes