A Chinese Corpus for Fine-grained Entity Typing
This provides a resource for researchers working on Chinese NLP, but it is incremental as it adapts an existing task to a new language.
The authors tackled the lack of datasets for fine-grained entity typing in Chinese by introducing a manually labeled corpus with 4,800 mentions, and they demonstrated its utility through experiments with neural models and cross-lingual transfer learning.
Fine-grained entity typing is a challenging task with wide applications. However, most existing datasets for this task are in English. In this paper, we introduce a corpus for Chinese fine-grained entity typing that contains 4,800 mentions manually labeled through crowdsourcing. Each mention is annotated with free-form entity types. To make our dataset useful in more possible scenarios, we also categorize all the fine-grained types into 10 general types. Finally, we conduct experiments with some neural models whose structures are typical in fine-grained entity typing and show how well they perform on our dataset. We also show the possibility of improving Chinese fine-grained entity typing through cross-lingual transfer learning.