Prompt Engineering and Calibration for Zero-Shot Commonsense Reasoning
This work addresses the problem of enhancing reasoning in smaller language models for AI practitioners, but it is incremental as it builds on existing techniques without major breakthroughs.
The study investigated prompt engineering and calibration strategies for improving zero-shot commonsense reasoning in smaller language models, finding that while individual strategies benefit certain models, their combined effects are mostly negative across five benchmarks.
Prompt engineering and calibration make large language models excel at reasoning tasks, including multiple choice commonsense reasoning. From a practical perspective, we investigate and evaluate these strategies on smaller language models. Through experiments on five commonsense reasoning benchmarks, we find that each strategy favors certain models, but their joint effects are mostly negative.