Revisiting Automated Prompting: Are We Actually Doing Better?
This work highlights the need to use manual prompts as a baseline in automated prompting research, addressing a methodological gap for researchers in NLP and AI.
The paper revisits automated prompting techniques across six downstream tasks and various few-shot settings, finding that automated prompting does not consistently outperform simple manual prompts.
Current literature demonstrates that Large Language Models (LLMs) are great few-shot learners, and prompting significantly increases their performance on a range of downstream tasks in a few-shot learning setting. An attempt to automate human-led prompting followed, with some progress achieved. In particular, subsequent work demonstrates automation can outperform fine-tuning in certain K-shot learning scenarios. In this paper, we revisit techniques for automated prompting on six different downstream tasks and a larger range of K-shot learning settings. We find that automated prompting does not consistently outperform simple manual prompts. Our work suggests that, in addition to fine-tuning, manual prompts should be used as a baseline in this line of research.