CLJun 10, 2019

Named Entity Recognition on Code-Switched Data: Overview of the CALCS 2018 Shared Task

Gustavo Aguilar, Fahad AlGhamdi, Victor Soto, Mona Diab, Julia Hirschberg, Thamar Solorio

arXiv:1906.04138v131.31113 citations

Originality Synthesis-oriented

AI Analysis

This addresses the problem of processing diverse entities and social media challenges in code-switched data for NLP researchers, but it is incremental as it builds on existing shared task frameworks.

The paper tackled Named Entity Recognition on code-switched social-media data by establishing a new dataset for English-Spanish and Modern Standard Arabic-Egyptian language pairs, resulting in best scores of 63.76% and 71.61% respectively.

In the third shared task of the Computational Approaches to Linguistic Code-Switching (CALCS) workshop, we focus on Named Entity Recognition (NER) on code-switched social-media data. We divide the shared task into two competitions based on the English-Spanish (ENG-SPA) and Modern Standard Arabic-Egyptian (MSA-EGY) language pairs. We use Twitter data and 9 entity types to establish a new dataset for code-switched NER benchmarks. In addition to the CS phenomenon, the diversity of the entities and the social media challenges make the task considerably hard to process. As a result, the best scores of the competitions are 63.76% and 71.61% for ENG-SPA and MSA-EGY, respectively. We present the scores of 9 participants and discuss the most common challenges among submissions.

View on arXiv PDF

Similar