CLMar 16

Robust Language Identification for Romansh Varieties

Charlotte Model, Sina Ahmadi, Jannis Vamvas

arXiv:2603.1596958.51 citationsh-index: 8

AI Analysis

This addresses the lack of documented systems for Romansh language identification, enabling applications like idiom-aware spell checking or machine translation, but it is incremental as it applies an existing SVM method to a new dataset.

The paper tackled the problem of distinguishing between regional varieties of the Romansh language, including a supra-regional variety, by building a language identification system, achieving an average in-domain accuracy of 97% on a new benchmark.

The Romansh language has several regional varieties, called idioms, which sometimes have limited mutual intelligibility. Despite this linguistic diversity, there has been a lack of documented efforts to build a language identification (LID) system that can distinguish between these idioms. Since Romansh LID should also be able to recognize Rumantsch Grischun, a supra-regional variety that combines elements of several idioms, this makes for a novel and interesting classification problem. In this paper, we present a LID system for Romansh idioms based on an SVM approach. We evaluate our model on a newly curated benchmark across two domains and find that it reaches an average in-domain accuracy of 97%, enabling applications such as idiom-aware spell checking or machine translation. Our classifier is publicly available.

View on arXiv PDF

Similar