CLAILGMar 29, 2022

Evaluating Prompts Across Multiple Choice Tasks In a Zero-Shot Setting

arXiv:2203.15754v1Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of prompt engineering for researchers and practitioners using large language models, though it is incremental as it builds on existing prompt-based methods.

The paper tackled the problem of how prompt qualities affect zero-shot performance in large language models by evaluating standardized prompts across multiple choice tasks, finding that including choices and using prompts not seen during pre-training significantly improve performance.

Large language models have shown that impressive zero-shot performance can be achieved through natural language prompts (Radford et al., 2019; Brown et al., 2020; Sanh et al., 2021). Creating an effective prompt, however, requires significant trial and error. That \textit{prompts} the question: how do the qualities of a prompt effects its performance? To this end, we collect and standardize prompts from a diverse range of tasks for use with tasks they were not designed for. We then evaluate these prompts across fixed multiple choice datasets for a quantitative analysis of how certain attributes of a prompt affect performance. We find that including the choices and using prompts not used during pre-training provide significant improvements. All experiments and code can be found https://github.com/gabeorlanski/zero-shot-cross-task.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes