CLOct 31, 2023

Ling-CL: Understanding NLP Models through Linguistic Curricula

arXiv:2310.20121v1132 citationsh-index: 14
Originality Incremental advance
AI Analysis

This work addresses the need for better interpretability and evaluation in NLP, potentially impacting all areas of the field by incorporating linguistic complexity early in research.

The authors tackled the problem of understanding the linguistic knowledge acquired by NLP models by developing data-driven curricula based on linguistic complexity, and they identified sets of linguistic metrics that inform task challenges and reasoning through analysis of benchmark datasets.

We employ a characterization of linguistic complexity from psycholinguistic and language acquisition research to develop data-driven curricula to understand the underlying linguistic knowledge that models learn to address NLP tasks. The novelty of our approach is in the development of linguistic curricula derived from data, existing knowledge about linguistic complexity, and model behavior during training. By analyzing several benchmark NLP datasets, our curriculum learning approaches identify sets of linguistic metrics (indices) that inform the challenges and reasoning required to address each task. Our work will inform future research in all NLP areas, allowing linguistic complexity to be considered early in the research and development process. In addition, our work prompts an examination of gold standards and fair evaluation in NLP.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes