CLJun 21, 2024

ICLEval: Evaluating In-Context Learning Ability of Large Language Models

arXiv:2406.14955v224 citationsHas Code
AI Analysis

This work addresses the need for better evaluation of ICL in LLMs, which is incremental as it builds on existing benchmarks by focusing on a specific overlooked capability.

The paper tackles the problem of evaluating In-Context Learning (ICL) ability in Large Language Models, which is often overlooked in existing frameworks, by introducing the ICLEval benchmark that assesses copying and rule learning, and finds that ICL ability is universally present, not solely dependent on model size, and develops early in pretraining.

In-Context Learning (ICL) is a critical capability of Large Language Models (LLMs) as it empowers them to comprehend and reason across interconnected inputs. Evaluating the ICL ability of LLMs can enhance their utilization and deepen our understanding of how this ability is acquired at the training stage. However, existing evaluation frameworks primarily focus on language abilities and knowledge, often overlooking the assessment of ICL ability. In this work, we introduce the ICLEval benchmark to evaluate the ICL abilities of LLMs, which encompasses two key sub-abilities: exact copying and rule learning. Through the ICLEval benchmark, we demonstrate that ICL ability is universally present in different LLMs, and model size is not the sole determinant of ICL efficacy. Surprisingly, we observe that ICL abilities, particularly copying, develop early in the pretraining process and stabilize afterward. Our source codes and benchmark are released at https://github.com/yiye3/ICLEval.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes