AISep 30, 2025

ICL Optimized Fragility

arXiv:2510.00300v1
Originality Incremental advance
AI Analysis

This reveals systematic trade-offs between efficiency and reasoning flexibility in LLMs, with implications for deployment and AI safety.

This study examined how in-context learning (ICL) guides affect reasoning across different knowledge domains using GPT-OSS:20b variants, finding that ICL models achieved 91%-99% accuracy on general knowledge tasks but showed degraded performance on complex reasoning problems (10-43% accuracy on riddles vs. 43% baseline), while complex mathematical reasoning remained unaffected.

ICL guides are known to improve task-specific performance, but their impact on cross-domain cognitive abilities remains unexplored. This study examines how ICL guides affect reasoning across different knowledge domains using six variants of the GPT-OSS:20b model: one baseline model and five ICL configurations (simple, chain-of-thought, random, appended text, and symbolic language). The models were subjected to 840 tests spanning general knowledge questions, logic riddles, and a mathematical olympiad problem. Statistical analysis (ANOVA) revealed significant behavioral modifications (p less than 0.001) across ICL variants, demonstrating a phenomenon termed "optimized fragility." ICL models achieved 91%-99% accuracy on general knowledge tasks while showing degraded performance on complex reasoning problems, with accuracy dropping to 10-43% on riddles compared to 43% for the baseline model. Notably, no significant differences emerged on the olympiad problem (p=0.2173), suggesting that complex mathematical reasoning remains unaffected by ICL optimization. These findings indicate that ICL guides create systematic trade-offs between efficiency and reasoning flexibility, with important implications for LLM deployment and AI safety.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes