CLDec 16, 2022

ALERT: Adapting Language Models to Reasoning Tasks

Ping Yu, Tianlu Wang, Olga Golovneva, Badr AlKhamissi, Siddharth Verma, Zhijing Jin, Gargi Ghosh, Mona Diab, Asli Celikyilmaz

BerkeleyMeta AIMicrosoftU of Toronto

arXiv:2212.08286v24.620 citationsh-index: 48

Originality Incremental advance

AI Analysis

This addresses the problem of understanding reasoning capabilities in language models for AI researchers, but it is incremental as it builds on existing benchmarks.

The authors introduced ALERT, a benchmark to assess whether language models apply reasoning skills or memorize training data, finding that finetuning improves skills like textual entailment but leads to overfitting on prompts.

Current large language models can perform reasonably well on complex tasks that require step-by-step reasoning with few-shot learning. Are these models applying reasoning skills they have learnt during pre-training and reason outside of their training context, or are they simply memorizing their training corpus at finer granularity and have learnt to better understand their context? To tease apart these possibilities, we introduce ALERT, a benchmark and suite of analyses for assessing language models' reasoning ability comparing pre-trained and finetuned models on complex tasks that require reasoning skills to solve. ALERT provides a test bed to asses any language model on fine-grained reasoning skills, which spans over 20 datasets and covers 10 different reasoning skills. We leverage ALERT to further investigate the role of finetuning. With extensive empirical analysis we find that language models learn more reasoning skills such as textual entailment, abductive reasoning, and analogical reasoning during finetuning stage compared to pretraining state. We also find that when language models are finetuned they tend to overfit to the prompt template, which hurts the robustness of models causing generalization problems.

View on arXiv PDF

Similar