CLMay 3, 2022

ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models

Junyi Li, Tianyi Tang, Zheng Gong, Lixin Yang, Zhuohao Yu, Zhipeng Chen, Jingyuan Wang, Wayne Xin Zhao, Ji-Rong Wen

Peking U

arXiv:2205.01523v131.8631 citationsh-index: 70Has Code

Originality Synthesis-oriented

AI Analysis

This work provides a benchmark for guiding the selection and application of PLMs in NLP tasks, but it is incremental as it focuses on evaluation rather than new methods.

The authors tackled the lack of systematic evaluation of pretrained language models' general language abilities by conducting a large-scale empirical study across four dimensions (memory, comprehension, reasoning, composition) on ten PLMs, finding that PLMs vary in performance based on training objectives and show sensitivity to data in fine-tuning.

Nowadays, pretrained language models (PLMs) have dominated the majority of NLP tasks. While, little research has been conducted on systematically evaluating the language abilities of PLMs. In this paper, we present a large-scale empirical study on general language ability evaluation of PLMs (ElitePLM). In our study, we design four evaluation dimensions, i.e. memory, comprehension, reasoning, and composition, to measure ten widely-used PLMs within five categories. Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; (3) PLMs have excellent transferability between similar tasks. Moreover, the prediction results of PLMs in our experiments are released as an open resource for more deep and detailed analysis on the language abilities of PLMs. This paper can guide the future work to select, apply, and design PLMs for specific tasks. We have made all the details of experiments publicly available at https://github.com/RUCAIBox/ElitePLM.

View on arXiv PDF Code

Similar