CLSep 30, 2025

QFrBLiMP: a Quebec-French Benchmark of Linguistic Minimal Pairs

David Beauchemin, Pier-Luc Veilleux, Richard Khoury, Johanna-Pascale Roy

arXiv:2509.25664v12 citationsh-index: 5

Originality Synthesis-oriented

AI Analysis

It provides a domain-specific benchmark for assessing LLMs on Quebec-French linguistic phenomena, highlighting limitations in semantic understanding.

The paper introduces QFrBLiMP, a benchmark of 1,761 minimal pairs for evaluating LLMs on Quebec-French grammar, finding that models scale with size but fail on tasks requiring deep semantic understanding, with a significant gap compared to human performance.

In this paper, we introduce the Quebec-French Benchmark of Linguistic Minimal Pairs (QFrBLiMP), a corpus designed to evaluate the linguistic knowledge of LLMs on prominent grammatical phenomena in Quebec-French. QFrBLiMP consists of 1,761 minimal pairs annotated with 20 linguistic phenomena. Specifically, these minimal pairs have been created by manually modifying sentences extracted from an official online resource maintained by a Québec government institution. Each pair is annotated by twelve Quebec-French native speakers, who select the sentence they feel is grammatical amongst the two. These annotations are used to compare the competency of LLMs with that of humans. We evaluate different LLMs on QFrBLiMP and MultiBLiMP-Fr by observing the rate of higher probabilities assigned to the sentences of each minimal pair for each category. We find that while grammatical competence scales with model size, a clear hierarchy of difficulty emerges. All benchmarked models consistently fail on phenomena requiring deep semantic understanding, revealing a critical limitation and a significant gap compared to human performance on these specific tasks.

View on arXiv PDF

Similar