CLMay 21, 2022

Life after BERT: What do Other Muppets Understand about Language?

Vladislav Lialin, Kevin Zhao, Namrata Shivagunde, Anna Rumshisky

arXiv:2205.10696v232.0642 citationsh-index: 35Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of understanding linguistic capabilities in pre-trained models for NLP researchers, revealing limitations in zero-shot compositionality and the unpredictability of model decisions, which is incremental in analyzing existing models.

The study evaluated 29 diverse transformer models, including T5, BART, ALBERT, and GPT variants, on the oLMpics benchmark and psycholinguistic probes, finding that none could handle compositional questions zero-shot, indicating this skill is not learnable with current pre-training objectives, and that architectural and training factors do not predict linguistic capabilities.

Existing pre-trained transformer analysis works usually focus only on one or two model families at a time, overlooking the variability of the architecture and pre-training objectives. In our work, we utilize the oLMpics benchmark and psycholinguistic probing datasets for a diverse set of 29 models including T5, BART, and ALBERT. Additionally, we adapt the oLMpics zero-shot setup for autoregressive models and evaluate GPT networks of different sizes. Our findings show that none of these models can resolve compositional questions in a zero-shot fashion, suggesting that this skill is not learnable using existing pre-training objectives. Furthermore, we find that global model decisions such as architecture, directionality, size of the dataset, and pre-training objective are not predictive of a model's linguistic capabilities.

View on arXiv PDF Code

Similar