CLDec 16, 2021

Simple Questions Generate Named Entity Recognition Datasets

Hyunjae Kim, Jaehyo Yoo, Seunghyun Yoon, Jinhyuk Lee, Jaewoo Kang

arXiv:2112.08808v422.5290 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the need for efficient NER dataset creation in low-resource settings, reducing reliance on domain experts, though it is incremental as it builds on existing QA systems.

This research tackles the problem of generating named entity recognition (NER) datasets without extensive human annotation by using simple questions to query an open-domain QA system, resulting in models that outperform low-resource models by an average F1 score of 19.4 and achieve state-of-the-art performance in few-shot NER with a 5.2 F1 score improvement.

Recent named entity recognition (NER) models often rely on human-annotated datasets, requiring the significant engagement of professional knowledge on the target domain and entities. This research introduces an ask-to-generate approach that automatically generates NER datasets by asking questions in simple natural language to an open-domain question answering system (e.g., "Which disease?"). Despite using fewer in-domain resources, our models, solely trained on the generated datasets, largely outperform strong low-resource models by an average F1 score of 19.4 for six popular NER benchmarks. Furthermore, our models provide competitive performance with rich-resource models that additionally leverage in-domain dictionaries provided by domain experts. In few-shot NER, we outperform the previous best model by an F1 score of 5.2 on three benchmarks and achieve new state-of-the-art performance.

View on arXiv PDF Code

Similar