CLSep 28, 2019

Overview for the Second Shared Task on Language Identification in Code-Switched Data

arXiv:1909.13016v11115 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of language identification in code-switched data for natural language processing applications, but it is incremental as it builds on a prior shared task.

The paper presents the second shared task on language identification in code-switched data, focusing on Modern Standard Arabic-Dialectal Arabic and Spanish-English language pairs, with nine participating teams showing that identification is harder for closely related languages and that systems improved over the previous year.

We present an overview of the second shared task on language identification in code-switched data. For the shared task, we had code-switched data from two different language pairs: Modern Standard Arabic-Dialectal Arabic (MSA-DA) and Spanish-English (SPA-ENG). We had a total of nine participating teams, with all teams submitting a system for SPA-ENG and four submitting for MSA-DA. Through evaluation, we found that once again language identification is more difficult for the language pair that is more closely related. We also found that this year's systems performed better overall than the systems from the previous shared task indicating overall progress in the state of the art for this task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes