CLApr 21, 2022

TEAM-Atreides at SemEval-2022 Task 11: On leveraging data augmentation and ensemble to recognize complex Named Entities in Bangla

Nazia Tasnim, Md. Istiak Hossain Shihab, Asif Shahriyar Sushmit, Steven Bethard, Farig Sadeque

arXiv:2204.09964v131.7627 citationsh-index: 46

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of complex entity recognition in Bangla for NLP applications, but it is incremental as it builds on existing methods like ELECTRA and ensemble techniques.

The paper tackled the problem of recognizing complex named entities in Bangla, such as nested or overlapping mentions, by using an ensemble of ELECTRA-based models and data augmentation, achieving competitive performance in SemEval 2022 Task 11.

Many areas, such as the biological and healthcare domain, artistic works, and organization names, have nested, overlapping, discontinuous entity mentions that may even be syntactically or semantically ambiguous in practice. Traditional sequence tagging algorithms are unable to recognize these complex mentions because they may violate the assumptions upon which sequence tagging schemes are founded. In this paper, we describe our contribution to SemEval 2022 Task 11 on identifying such complex Named Entities. We have leveraged the ensemble of multiple ELECTRA-based models that were exclusively pretrained on the Bangla language with the performance of ELECTRA-based models pretrained on English to achieve competitive performance on the Track-11. Besides providing a system description, we will also present the outcomes of our experiments on architectural decisions, dataset augmentations, and post-competition findings.

View on arXiv PDF

Similar