CLOct 22, 2025

BLiSS 1.0: Evaluating Bilingual Learner Competence in Second Language Small Language Models

Yuan Gao, Suchir Salhan, Andrew Caines, Paula Buttery, Weiwei Sun

arXiv:2510.19419v12.7h-index: 17

Originality Incremental advance

AI Analysis

This provides a robust tool for measuring how training objectives impact alignment with human language acquisition patterns, addressing a gap in cognitively inspired model evaluation.

The paper tackles the problem of evaluating bilingual learner competence in second language small language models by introducing BLiSS 1.0, a benchmark based on selective tolerance, and finds that this capability is distinct from standard grammaticality, with performance clustering by training paradigm.

To bridge the gap between performance-oriented benchmarks and the evaluation of cognitively inspired models, we introduce BLiSS 1.0, a Benchmark of Learner Interlingual Syntactic Structure. Our benchmark operationalizes a new paradigm of selective tolerance, testing whether a model finds a naturalistic learner error more plausible than a matched, artificial error within the same sentence. Constructed from over 2.8 million naturalistic learner sentences, BLiSS provides 136,867 controlled triplets (corrected, learner, artificial) for this purpose. Experiments on a diverse suite of models demonstrate that selective tolerance is a distinct capability from standard grammaticality, with performance clustering strongly by training paradigm. This validates BLiSS as a robust tool for measuring how different training objectives impact a model's alignment with the systematic patterns of human language acquisition.

View on arXiv PDF

Similar