AI SEMar 3

The Tool-Overuse Illusion: Why Does LLM Prefer External Tools over Internal Knowledge?

Yirong Zeng, Shen You, Yufei Liu, Qunyao Du, Xiao Ding, Yutai Hou, Yuxian Wang, Wu Ning, Haonan Song, Dandan Tu, Bibo Cai, Ting Liu

arXiv:2604.19749h-index: 5

AI Analysis

This addresses a critical inefficiency in LLM tool-use for AI practitioners, offering incremental improvements to mitigate overreliance on external resources.

The paper tackles the problem of tool overuse in LLMs, where models unnecessarily rely on external tools during reasoning, and finds that this issue is pervasive across diverse models. It proposes strategies to reduce tool usage by up to 82.8% while improving accuracy, and identifies reward structures as a cause, with interventions cutting unnecessary tool calls by up to 66.7% without sacrificing accuracy.

Equipping LLMs with external tools effectively addresses internal reasoning limitations. However, it introduces a critical yet under-explored phenomenon: tool overuse, the unnecessary tool-use during reasoning. In this paper, we first reveal this phenomenon is pervasive across diverse LLMs. We then experimentally elucidate its underlying mechanisms through two key lenses: (1) First, by analyzing tool-use behavior across different internal knowledge availability regions, we identify a \textit{knowledge epistemic illusion}: models misjudge internal knowledge boundaries and fail to accurately perceive their actual knowledge availability. To mitigate this, we propose a knowledge-aware epistemic boundary alignment strategy based on direct preference optimization, which reduces tool usage in by 82.8\% while yielding an accuracy improvement. (2) Second, we establish a causal link between reward structures and tool-use behavior by visualizing the tool-augmented training process. It reveals that \textit{outcome-only rewards} inadvertently encourage tool overuse by rewarding only final correctness, regardless of tool efficiency. To verify this, we balance reward signals during training rather than relying on outcome-only rewards, cutting unnecessary tool calls by 66.7\% (7B) and 60.7\% (32B) without sacrificing accuracy. Finally, we provide theoretical justification in this two lenses to understand tool overuse.

View on arXiv PDF

Similar