Semantic Anchors in In-Context Learning: Why Small LLMs Cannot Flip Their Labels
This work clarifies fundamental limits of few-shot prompting for small LLMs, showing they cannot override label semantics, which is an incremental finding relevant for researchers in natural language processing and AI.
The study investigated whether in-context learning (ICL) can override pre-trained label semantics in small LLMs (1-12B parameters) by testing them with natural and inverted demonstrations across eight classification tasks. It found that ICL cannot flip label meanings, as semantic override rates remained exactly zero, indicating that ICL primarily adjusts input projections onto stable semantic directions rather than remapping labels.
Can in-context learning (ICL) override pre-trained label semantics, or does it merely refine an existing semantic backbone? We address this question by treating LLMs as prompt-induced classifiers and contrasting their behavior under \emph{natural} demonstrations (with correct labels) and \emph{inverted} demonstrations (systematically flipping label meanings). We decompose ICL behavior into three alignment metrics (truth, prior, and prompt alignment) and introduce a semantic override rate, defined as correctness under flipped semantics. Across eight classification tasks and eight open-source LLMs (1--12B parameters), we find consistent evidence for a semantic anchor view. With natural demonstrations, ICL improves accuracy while maintaining strong prior alignment; most correct predictions coincide with zero-shot behavior, even when the prior is weak. With inverted demonstrations, models cannot learn coherent anti-semantic classifiers: prompt alignment increases only by sacrificing accuracy, and semantic override rates remain exactly zero in our few-shot 1--12B setting. Rather than flexibly remapping label meanings, ICL primarily adjusts how inputs project onto stable semantic directions learned during pre-training, clarifying fundamental limits of few-shot prompting and suggesting that overriding label semantics at these scales requires interventions beyond ICL. All code is available at: https://github.com/AnanthaPadmanaban-KrishnaKumar/semantic-anchors-icl.