CLSep 28, 2019

Overview for the Second Shared Task on Language Identification in Code-Switched Data

Giovanni Molina, Fahad AlGhamdi, Mahmoud Ghoneim, Abdelati Hawwari, Nicolas Rey-Villamizar, Mona Diab, Thamar Solorio

arXiv:1909.13016v131.41115 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of language identification in code-switched data for natural language processing applications, but it is incremental as it builds on a prior shared task.

The paper presents the second shared task on language identification in code-switched data, focusing on Modern Standard Arabic-Dialectal Arabic and Spanish-English language pairs, with nine participating teams showing that identification is harder for closely related languages and that systems improved over the previous year.

We present an overview of the second shared task on language identification in code-switched data. For the shared task, we had code-switched data from two different language pairs: Modern Standard Arabic-Dialectal Arabic (MSA-DA) and Spanish-English (SPA-ENG). We had a total of nine participating teams, with all teams submitting a system for SPA-ENG and four submitting for MSA-DA. Through evaluation, we found that once again language identification is more difficult for the language pair that is more closely related. We also found that this year's systems performed better overall than the systems from the previous shared task indicating overall progress in the state of the art for this task.

View on arXiv PDF

Similar