CL IRFeb 22, 2023

FiNER-ORD: Financial Named Entity Recognition Open Research Dataset

Agam Shah, Abhinav Gullapalli, Ruchit Vithani, Michael Galarnyk, Sudheer Chava

Georgia Tech

arXiv:2302.11157v25.824 citationsh-index: 28Has Code

Originality Synthesis-oriented

AI Analysis

This provides a benchmark for financial NLP tasks, addressing domain-specific challenges, but is incremental as it adapts existing methods to a new dataset.

The authors tackled the lack of a domain-specific dataset for financial named entity recognition by creating FiNER-ORD, a high-quality English dataset, and benchmarked multiple pre-trained and large language models on it.

Over the last two decades, the development of the CoNLL-2003 named entity recognition (NER) dataset has helped enhance the capabilities of deep learning and natural language processing (NLP). The finance domain, characterized by its unique semantic and lexical variations for the same entities, presents specific challenges to the NER task; thus, a domain-specific customized dataset is crucial for advancing research in this field. In our work, we develop the first high-quality English Financial NER Open Research Dataset (FiNER-ORD). We benchmark multiple pre-trained language models (PLMs) and large-language models (LLMs) on FiNER-ORD. We believe our proposed FiNER-ORD dataset will open future opportunities to use FiNER-ORD as a benchmark for financial domain-specific NER and NLP tasks. Our dataset, models, and code are publicly available on GitHub and Hugging Face under CC BY-NC 4.0 license.

View on arXiv PDF Code

Similar