CLIRLGJan 13, 2020

CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese

arXiv:2001.04351v471 citations
AI Analysis

This provides a new benchmark for fine-grained NER in Chinese, addressing the need for more diverse and realistic datasets in the field.

The authors introduced CLUENER2020, a fine-grained Chinese named entity recognition dataset with 10 categories, which is more challenging and reflective of real-world applications than existing datasets, and they provided baselines and a leaderboard for future research.

In this paper, we introduce the NER dataset from CLUE organization (CLUENER2020), a well-defined fine-grained dataset for named entity recognition in Chinese. CLUENER2020 contains 10 categories. Apart from common labels like person, organization, and location, it contains more diverse categories. It is more challenging than current other Chinese NER datasets and could better reflect real-world applications. For comparison, we implement several state-of-the-art baselines as sequence labeling tasks and report human performance, as well as its analysis. To facilitate future work on fine-grained NER for Chinese, we release our dataset, baselines, and leader-board.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes