CLSep 12, 2023

Overview of GUA-SPA at IberLEF 2023: Guarani-Spanish Code Switching Analysis

arXiv:2309.06163v112 citationsh-index: 39
Originality Synthesis-oriented
AI Analysis

This addresses the need for tools to process under-resourced languages like Guarani in code-switched contexts, but it is incremental as it builds on existing shared task frameworks.

The paper tackled the problem of analyzing code-switching between Guarani and Spanish by introducing the first shared task, GUA-SPA, which included token language identification, NER, and a novel classification of Spanish usage, resulting in good performance on Task 1 and mixed results on Tasks 2 and 3.

We present the first shared task for detecting and analyzing code-switching in Guarani and Spanish, GUA-SPA at IberLEF 2023. The challenge consisted of three tasks: identifying the language of a token, NER, and a novel task of classifying the way a Spanish span is used in the code-switched context. We annotated a corpus of 1500 texts extracted from news articles and tweets, around 25 thousand tokens, with the information for the tasks. Three teams took part in the evaluation phase, obtaining in general good results for Task 1, and more mixed results for Tasks 2 and 3.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes