CLJun 18, 2024

Composited-Nested-Learning with Data Augmentation for Nested Named Entity Recognition

Xingming Liao, Nankai Lin, Haowen Li, Lianglun Cheng, Zhuowei Wang, Chong Chen

arXiv:2406.12779v11.0

Originality Incremental advance

AI Analysis

This addresses the data scarcity issue for researchers and practitioners in natural language processing working on nested entity recognition, but it is incremental as it adapts data augmentation specifically for NNER.

The paper tackled the problem of scarce annotated data for Nested Named Entity Recognition (NNER) by proposing a data augmentation method using Composited-Nested-Learning and Confidence Filtering Mechanism, resulting in improvements on ACE2004 and ACE2005 datasets and alleviating sample imbalance.

Nested Named Entity Recognition (NNER) focuses on addressing overlapped entity recognition. Compared to Flat Named Entity Recognition (FNER), annotated resources are scarce in the corpus for NNER. Data augmentation is an effective approach to address the insufficient annotated corpus. However, there is a significant lack of exploration in data augmentation methods for NNER. Due to the presence of nested entities in NNER, existing data augmentation methods cannot be directly applied to NNER tasks. Therefore, in this work, we focus on data augmentation for NNER and resort to more expressive structures, Composited-Nested-Label Classification (CNLC) in which constituents are combined by nested-word and nested-label, to model nested entities. The dataset is augmented using the Composited-Nested-Learning (CNL). In addition, we propose the Confidence Filtering Mechanism (CFM) for a more efficient selection of generated data. Experimental results demonstrate that this approach results in improvements in ACE2004 and ACE2005 and alleviates the impact of sample imbalance.

View on arXiv PDF

Similar