CLAILGDec 13, 2024

HiTZ at VarDial 2025 NorSID: Overcoming Data Scarcity with Language Transfer and Automatic Data Annotation

arXiv:2412.10095v219 citationsh-index: 17COLING Workshops
Originality Synthesis-oriented
AI Analysis

This work addresses data scarcity in low-resource Norwegian dialects for NLP tasks, but it is incremental as it applies existing cross-lingual methods to a new dataset.

The paper tackled the NorSID Shared Task for Norwegian dialect processing by fine-tuning multitask models in a cross-lingual setting for Intent Detection and Slot Filling, and a model for Dialect Identification, achieving results that maintained performance from development to test sets without dropping.

In this paper we present our submission for the NorSID Shared Task as part of the 2025 VarDial Workshop (Scherrer et al., 2025), consisting of three tasks: Intent Detection, Slot Filling and Dialect Identification, evaluated using data in different dialects of the Norwegian language. For Intent Detection and Slot Filling, we have fine-tuned a multitask model in a cross-lingual setting, to leverage the xSID dataset available in 17 languages. In the case of Dialect Identification, our final submission consists of a model fine-tuned on the provided development set, which has obtained the highest scores within our experiments. Our final results on the test set show that our models do not drop in performance compared to the development set, likely due to the domain-specificity of the dataset and the similar distribution of both subsets. Finally, we also report an in-depth analysis of the provided datasets and their artifacts, as well as other sets of experiments that have been carried out but did not yield the best results. Additionally, we present an analysis on the reasons why some methods have been more successful than others; mainly the impact of the combination of languages and domain-specificity of the training data on the results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes