CLFeb 17

LuxMT Technical Report

arXiv:2602.15506v1

Originality Synthesis-oriented

AI Analysis

This work addresses translation for low-resource Luxembourgish, but it is incremental as it builds on existing models and data.

The authors tackled machine translation for Luxembourgish into French and English by fine-tuning Gemma 3 27B and creating a novel benchmark, achieving strong improvements over the baseline, including for German translation without training data in that language.

We introduce LuxMT, a machine translation system based on Gemma 3 27B and fine-tuned for translation from Luxembourgish (LB) into French (FR) and English (EN). To assess translation performance, we construct a novel benchmark covering LB-FR, LB-EN, and LB-FR using human-translated data from Luci, a tourist magazine about Luxembourg. Training data stems from LuxAlign, a parallel corpus of multilingual Luxembourgish news articles, and LB parliamentary transcripts augmented with Google Translate. We filter the data using LuxEmbedder, LB sentence embeddings, to remove low-equivalence segment-pairs. Overall, LuxMT's results suggest strong improvements over the Gemma 3 baseline, even for translating LB to German (DE), despite the training data not containing any DE. We also explore LuxEmbedder's potential to be used as a quality estimation metric and find strong correlations with other reference-based metrics. However, we call for further research to fully assess the metric's utility and advise using it with caution.

View on arXiv PDF

Similar