AI CLMay 7

More Is Not Always Better: Cross-Component Interference in LLM Agent Scaffolding

arXiv:2605.0571664.0h-index: 3

AI Analysis

For practitioners building LLM agent systems, this work demonstrates that the common assumption of 'more components is better' is flawed, and provides evidence for task-specific subset selection over maximally-equipped defaults.

LLM agent systems suffer from cross-component interference (CCI), where adding more scaffolding components (planning, tools, memory, etc.) degrades performance. On HotpotQA, a single-tool agent outperforms the All-In system by 32% (F1 0.233 vs 0.177), and on GSM8K, a 3-component subset beats All-In by 79% (0.43 vs 0.24).

LLM agent systems are built by stacking scaffolding components (planning, tools, memory, self-reflection, retrieval) assuming more is better. We study cross-component interference (CCI): degradation when components interact destructively. We run a full factorial experiment over all 2^5=32 subsets of five components on HotpotQA and GSM8K with Llama-3.1-8B/70B (96 conditions, up to 10 seeds). The All-In system is consistently suboptimal: on HotpotQA, a single-tool agent surpasses All-In by 32% (F1 0.233 vs 0.177, p=0.023); on GSM8K, a 3-component subset beats All-In by 79% (0.43 vs 0.24, p=0.010). Optimal component count is task-dependent (k*=1-4) and scale-sensitive: at 70B, combinations that hurt at 8B provide gains, though All-In still trails the best subset. We fit a main-effects regression (R^2=0.916, adj-R^2=0.899, LOOCV=0.872), compute exact Shapley values, and find 183/325 submodularity violations (56.3%), showing greedy selection is unreliable. A three-body synergy among Tool Use, Self-Reflection, and Retrieval (INT_3=+0.175, 95% CI [+0.003,+0.351]) is reported as exploratory. CCI replicates across model families (Qwen2.5) and is robust to prompt paraphrasing. Our findings suggest maximally-equipped agent defaults should be replaced by task-specific subset selection via interaction-aware analysis.

View on arXiv PDF

Similar