CL IR LGJan 13, 2020

CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese

Liang Xu, Yu tong, Qianqian Dong, Yixuan Liao, Cong Yu, Yin Tian, Weitang Liu, Lu Li, Caiquan Liu, Xuanwei Zhang

arXiv:2001.04351v43.571 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This provides a new benchmark for fine-grained NER in Chinese, addressing the need for more diverse and realistic datasets in the field.

The authors introduced CLUENER2020, a fine-grained Chinese named entity recognition dataset with 10 categories, which is more challenging and reflective of real-world applications than existing datasets, and they provided baselines and a leaderboard for future research.

In this paper, we introduce the NER dataset from CLUE organization (CLUENER2020), a well-defined fine-grained dataset for named entity recognition in Chinese. CLUENER2020 contains 10 categories. Apart from common labels like person, organization, and location, it contains more diverse categories. It is more challenging than current other Chinese NER datasets and could better reflect real-world applications. For comparison, we implement several state-of-the-art baselines as sequence labeling tasks and report human performance, as well as its analysis. To facilitate future work on fine-grained NER for Chinese, we release our dataset, baselines, and leader-board.

View on arXiv PDF Code

Similar