NICE: To Optimize In-Context Examples or Not?
This work addresses the problem of inefficient prompt engineering for researchers and practitioners by providing a heuristic to decide whether to optimize instructions or ICE, though it is incremental as it builds on existing in-context learning research.
The paper challenges the consensus that optimizing in-context examples (ICE) is crucial for large language models by showing that with detailed task-specific instructions, ICE optimization yields diminishing returns on many tasks. It introduces a metric called NICE to predict when ICE optimization is useful, enabling better resource allocation for new tasks.
Recent work shows that in-context learning and optimization of in-context examples (ICE) can significantly improve the accuracy of large language models (LLMs) on a wide range of tasks, leading to an apparent consensus that ICE optimization is crucial for better performance. However, most of these studies assume a fixed or no instruction provided in the prompt. We challenge this consensus by investigating the necessity of optimizing ICE when task-specific instructions are provided and find that there are many tasks for which it yields diminishing returns. In particular, using a diverse set of tasks and a systematically created instruction set with gradually added details, we find that as the prompt instruction becomes more detailed, the returns on ICE optimization diminish. To characterize this behavior, we introduce a task-specific metric called Normalized Invariability to Choice of Examples (NICE) that quantifies the learnability of tasks from a given instruction, and provides a heuristic to help decide whether to optimize instructions or ICE for a new task. Given a task, the proposed metric can reliably predict the utility of optimizing ICE compared to using random ICE. Our code is available at https://github.com/microsoft/nice-icl.