Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning
This comparative analysis helps practitioners choose appropriate demonstration selection methods for real-world LLM applications by clarifying their inconsistent performance across tasks.
This paper systematically evaluated six demonstration selection algorithms for LLM in-context learning across five datasets, finding significant performance variations where some methods failed to beat random selection and revealing that more demonstrations don't always improve performance while creating accuracy-efficiency trade-offs.
In-context learning can help Large Language Models (LLMs) to adapt new tasks without additional training. However, this performance heavily depends on the quality of the demonstrations, driving research into effective demonstration selection algorithms to optimize this process. These algorithms assist users in selecting the best $k$ input-label pairs (demonstration examples) based on a given test input, enabling LLMs to in-context learn the relationship between the provided examples and the test inputs. Despite all the proposed demonstration selection algorithms, their efficiency and effectiveness remain unclear. This lack of clarity make it difficult to apply these algorithms in real-world scenarios and poses challenges for future research aimed at developing improved methods. This paper revisits six proposed algorithms, evaluating them on five datasets from both efficiency and effectiveness perspectives. Our experiments reveal significant variations in algorithm performance across different tasks, with some methods struggling to outperform random selection in certain scenarios. We also find that increasing the number of demonstrations does not always lead to better performance, and that there are often trade-offs between accuracy and computational efficiency. Our code is available at https://github.com/Tizzzzy/Demonstration_Selection_Overview.