LLM-DER:A Named Entity Recognition Method Based on Large Language Models for Chinese Coal Chemical Domain
This work addresses the challenge of NER in specialized domains like Chinese coal chemical industry, where data is limited and entity structures are complex, offering an incremental improvement over existing few-shot methods.
The paper tackles the problem of domain-specific Named Entity Recognition (NER) with scarce labeled data and complex entity structures, such as in the Chinese coal chemical domain, by proposing LLM-DER, a framework that uses Large Language Models to generate and filter entity relationships. The results show that LLM-DER outperforms GPT-3.5-turbo and fully-supervised baselines on the Resume and Coal datasets.
Domain-specific Named Entity Recognition (NER), whose goal is to recognize domain-specific entities and their categories, provides an important support for constructing domain knowledge graphs. Currently, deep learning-based methods are widely used and effective in NER tasks, but due to the reliance on large-scale labeled data. As a result, the scarcity of labeled data in a specific domain will limit its application.Therefore, many researches started to introduce few-shot methods and achieved some results. However, the entity structures in specific domains are often complex, and the current few-shot methods are difficult to adapt to NER tasks with complex features.Taking the Chinese coal chemical industry domain as an example,there exists a complex structure of multiple entities sharing a single entity, as well as multiple relationships for the same pair of entities, which affects the NER task under the sample less condition.In this paper, we propose a Large Language Models (LLMs)-based entity recognition framework LLM-DER for the domain-specific entity recognition problem in Chinese, which enriches the entity information by generating a list of relationships containing entity types through LLMs, and designing a plausibility and consistency evaluation method to remove misrecognized entities, which can effectively solve the complex structural entity recognition problem in a specific domain.The experimental results of this paper on the Resume dataset and the self-constructed coal chemical dataset Coal show that LLM-DER performs outstandingly in domain-specific entity recognition, not only outperforming the existing GPT-3.5-turbo baseline, but also exceeding the fully-supervised baseline, verifying its effectiveness in entity recognition.