CLFeb 18, 2025

Euskarazko lehen C1 ebaluatzaile automatikoa

arXiv:2503.01851v11 citationsh-index: 1IkerGazte. Nazioarteko ikerketa euskaraz
Originality Synthesis-oriented
AI Analysis

This work addresses the need for automated language assessment in Basque, a low-resource language, though it appears incremental as it applies existing methods to a new dataset.

The researchers tackled the problem of automatically evaluating Basque language compositions for C1 proficiency by training a system on 10,000 transcribed compositions, achieving a functional evaluator through techniques like EDA, SCL, and regulation to address data scarcity and overfitting.

Throughout this project, we have attempted to develop an automatic evaluator that determines whether Basque language compositions meet the C1 level. To achieve our goal, we obtained 10,000 transcribed compositions through an agreement between HABE and HiTZ to train our system. We have developed different techniques to avoid data scarcity and system overfitting: EDA, SCL and regulation; We have also conducted tests with different Language Models to analyze their behavior. Finally, we have also performed analyses of different system behaviors to measure model calibration and the impact of artifacts. -- Proiektu honetan zehar euskarazko idazlanek C1 maila duten edo ez zehazten duen ebaluatzaile automatiko bat garatzen saiatu gara. Gure helburua betetzeko HABE eta HiTZ arteko hitzarmenaren bitartez 10.000 transkribatutako idazlan eskuratu ditugu gure sistema entrenatzeko. Datu eskasia eta sistemaren gaindoitzea ekiditeko teknika ezberdinak landu ditugu: EDA, SCL eta erregulazioa; Hizkuntza Eredu ezberdinekin ere probak egin ditugu duten portaera aztertzeko. Azkenik, sistema ezberdinen portaeren analisiak ere egin ditugu, ereduen kalibrazioa eta artefaktuen eragina neurtzeko.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes