CLAIMar 6, 2023

Two-stage Pipeline for Multilingual Dialect Detection

arXiv:2303.03487v2270 citationsh-index: 5Has Code
AI Analysis

This work addresses dialect detection for multilingual applications, but it is incremental as it builds on existing shared task frameworks.

The paper tackles dialect identification for localizing Large Language Models by proposing a two-stage system for the VarDial 2023 shared task, achieving scores of 58.54% for a 9-way classification and 85.61% for a 6-way classification.

Dialect Identification is a crucial task for localizing various Large Language Models. This paper outlines our approach to the VarDial 2023 shared task. Here we have to identify three or two dialects from three languages each which results in a 9-way classification for Track-1 and 6-way classification for Track-2 respectively. Our proposed approach consists of a two-stage system and outperforms other participants' systems and previous works in this domain. We achieve a score of 58.54% for Track-1 and 85.61% for Track-2. Our codebase is available publicly (https://github.com/ankit-vaidya19/EACL_VarDial2023).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes