CLMay 23, 2023

Arukikata Travelogue Dataset with Geographic Entity Mention, Coreference, and Link Annotation

Shohei Higashiyama, Hiroki Ouchi, Hiroki Teranishi, Hiroyuki Otomo, Yusuke Ide, Aitaro Yamamoto, Hiroyuki Shindo, Yuki Matsuda, Shoko Wakamiya, Naoya Inoue, Ikuya Yamada, Taro Watanabe

arXiv:2305.13844v119.1106 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This provides a resource for researchers in natural language processing focusing on geographic information extraction, though it is incremental as it builds on existing geoparsing concepts.

The authors tackled the problem of document-level geoparsing by creating a Japanese travelogue dataset with 200 documents, 12,171 mentions, 6,339 coreference clusters, and 2,551 linked geo-entities for evaluation.

Geoparsing is a fundamental technique for analyzing geo-entity information in text. We focus on document-level geoparsing, which considers geographic relatedness among geo-entity mentions, and presents a Japanese travelogue dataset designed for evaluating document-level geoparsing systems. Our dataset comprises 200 travelogue documents with rich geo-entity information: 12,171 mentions, 6,339 coreference clusters, and 2,551 geo-entities linked to geo-database entries.

View on arXiv PDF Code

Similar