Exploring Generalization Ability of Pretrained Language Models on Arithmetic and Logical Reasoning
This work addresses the generalization limitations of PLMs for reasoning tasks, which is crucial for AI reliability, but it is incremental as it builds on existing models and tasks.
The study investigated the generalization ability of pre-trained language models (PLMs) on arithmetic and logical reasoning tasks, finding that PLMs generalize well within the same data distribution but struggle significantly with out-of-distribution test data.
To quantitatively and intuitively explore the generalization ability of pre-trained language models (PLMs), we have designed several tasks of arithmetic and logical reasoning. We both analyse how well PLMs generalize when the test data is in the same distribution as the train data and when it is different, for the latter analysis, we have also designed a cross-distribution test set other than the in-distribution test set. We conduct experiments on one of the most advanced and publicly released generative PLM - BART. Our research finds that the PLMs can easily generalize when the distribution is the same, however, it is still difficult for them to generalize out of the distribution.