CLMay 23, 2023

Arukikata Travelogue Dataset with Geographic Entity Mention, Coreference, and Link Annotation

arXiv:2305.13844v1106 citations
Originality Synthesis-oriented
AI Analysis

This provides a resource for researchers in natural language processing focusing on geographic information extraction, though it is incremental as it builds on existing geoparsing concepts.

The authors tackled the problem of document-level geoparsing by creating a Japanese travelogue dataset with 200 documents, 12,171 mentions, 6,339 coreference clusters, and 2,551 linked geo-entities for evaluation.

Geoparsing is a fundamental technique for analyzing geo-entity information in text. We focus on document-level geoparsing, which considers geographic relatedness among geo-entity mentions, and presents a Japanese travelogue dataset designed for evaluating document-level geoparsing systems. Our dataset comprises 200 travelogue documents with rich geo-entity information: 12,171 mentions, 6,339 coreference clusters, and 2,551 geo-entities linked to geo-database entries.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes