CLNov 12, 2024

Controlled Evaluation of Syntactic Knowledge in Multilingual Language Models

arXiv:2411.07474v224 citationsh-index: 1COLING Workshops
Originality Synthesis-oriented
AI Analysis

This work addresses the lack of understanding of syntactic generalization in low-resource languages, which represent diverse syntactic patterns, but is incremental as it extends existing evaluation methods to new languages.

The study developed targeted syntactic evaluation tests for three low-resource languages (Basque, Hindi, Swahili) to assess multilingual Transformer LMs, finding that some tasks were easy while others, like agreement in Basque and Swahili, were challenging, and uncovered biases such as a habitual aspect bias in Hindi for mBERT and underperformance in XGLM-4.5B.

Language models (LMs) are capable of acquiring elements of human-like syntactic knowledge. Targeted syntactic evaluation tests have been employed to measure how well they form generalizations about syntactic phenomena in high-resource languages such as English. However, we still lack a thorough understanding of LMs' capacity for syntactic generalizations in low-resource languages, which are responsible for much of the diversity of syntactic patterns worldwide. In this study, we develop targeted syntactic evaluation tests for three low-resource languages (Basque, Hindi, and Swahili) and use them to evaluate five families of open-access multilingual Transformer LMs. We find that some syntactic tasks prove relatively easy for LMs while others (agreement in sentences containing indirect objects in Basque, agreement across a prepositional phrase in Swahili) are challenging. We additionally uncover issues with publicly available Transformers, including a bias toward the habitual aspect in Hindi in multilingual BERT and underperformance compared to similar-sized models in XGLM-4.5B.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes