CLAIOct 26, 2025

Low-Resource Dialect Adaptation of Large Language Models: A French Dialect Case-Study

arXiv:2510.22747v11 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses the dialect gap for minority linguistic communities by providing a cost-effective method to expand high-quality LLM access, though it is incremental as it builds on existing continual pre-training and parameter-efficient fine-tuning techniques.

The paper tackled the problem of adapting large language models to low-resource dialects like Québec French using continual pre-training with parameter-efficient methods, resulting in improved performance on minority dialect benchmarks with minimal regression on prestige language benchmarks and under 1% of model parameters updated.

Despite the widespread adoption of large language models (LLMs), their strongest capabilities remain largely confined to a small number of high-resource languages for which there is abundant training data. Recently, continual pre-training (CPT) has emerged as a means to fine-tune these models to low-resource regional dialects. In this paper, we study the use of CPT for dialect learning under tight data and compute budgets. Using low-rank adaptation (LoRA) and compute-efficient continual pre-training, we adapt three LLMs to the Québec French dialect using a very small dataset and benchmark them on the COLE suite. Our experiments demonstrate an improvement on the minority dialect benchmarks with minimal regression on the prestige language benchmarks with under 1% of model parameters updated. Analysis of the results demonstrate that gains are highly contingent on corpus composition. These findings indicate that CPT with parameter-efficient fine-tuning (PEFT) can narrow the dialect gap by providing cost-effective and sustainable language resource creation, expanding high-quality LLM access to minority linguistic communities. We release the first Québec French LLMs on HuggingFace.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes