CL IROct 16, 2018

Large Language Models for Few-Shot Named Entity Recognition

Yufei Zhao, Xiaoshi Zhong, Erik Cambria, Jagath C. Rajapakse

arXiv:1810.06818v30.73 citations

Originality Incremental advance

AI Analysis

This addresses the problem of leveraging large language models with minimal human effort for few-shot NER, which is incremental as it builds on existing prompting techniques.

The paper tackles few-shot named entity recognition by proposing GPT4NER, a method that prompts large language models with structured prompts to transform the task into sequence generation, achieving F1 scores of 83.15% on CoNLL2003 and 70.37% on OntoNotes5.0, outperforming few-shot baselines by an average of 7 points.

Named entity recognition (NER) is a fundamental task in numerous downstream applications. Recently, researchers have employed pre-trained language models (PLMs) and large language models (LLMs) to address this task. However, fully leveraging the capabilities of PLMs and LLMs with minimal human effort remains challenging. In this paper, we propose GPT4NER, a method that prompts LLMs to resolve the few-shot NER task. GPT4NER constructs effective prompts using three key components: entity definition, few-shot examples, and chain-of-thought. By prompting LLMs with these effective prompts, GPT4NER transforms few-shot NER, which is traditionally considered as a sequence-labeling problem, into a sequence-generation problem. We conduct experiments on two benchmark datasets, CoNLL2003 and OntoNotes5.0, and compare the performance of GPT4NER to representative state-of-the-art models in both few-shot and fully supervised settings. Experimental results demonstrate that GPT4NER achieves the $F_1$ of 83.15\% on CoNLL2003 and 70.37\% on OntoNotes5.0, significantly outperforming few-shot baselines by an average margin of 7 points. Compared to fully-supervised baselines, GPT4NER achieves 87.9\% of their best performance on CoNLL2003 and 76.4\% of their best performance on OntoNotes5.0. We also utilize a relaxed-match metric for evaluation and report performance in the sub-task of named entity extraction (NEE), and experiments demonstrate their usefulness to help better understand model behaviors in the NER task.

View on arXiv PDF

Similar