SEAIOct 17, 2025

Repairing Tool Calls Using Post-tool Execution Reflection and RAG

arXiv:2510.17874v1h-index: 16
Originality Incremental advance
AI Analysis

This work addresses tool call failures for users of agentic systems interacting with external tools like Kubernetes, but it is incremental as it builds on existing reflection and RAG methods for a specific domain.

The paper tackled the problem of tool call failures in agentic systems by developing a post-tool execution reflection component that combines LLM-based reflection with domain-specific RAG, focusing on kubectl commands in Kubernetes. The result showed that this approach increased successful execution pass rates by 55% for some models and improved correct query answers by 36% on average, with troubleshooting documents boosting pass rates by 10% compared to official documentation.

Agentic systems interact with external systems by calling tools such as Python functions, REST API endpoints, or command line tools such as kubectl in Kubernetes. These tool calls often fail for various syntactic and semantic reasons. Some less obvious semantic errors can only be identified and resolved after analyzing the tool's response. To repair these errors, we develop a post-tool execution reflection component that combines large language model (LLM)-based reflection with domain-specific retrieval-augmented generation (RAG) using documents describing both the specific tool being called and troubleshooting documents related to the tool. For this paper, we focus on the use case of the kubectl command line tool to manage Kubernetes, a platform for orchestrating cluster applications. Through a larger empirical study and a smaller manual evaluation, we find that our RAG-based reflection will repair kubectl commands such that they are both more likely to successfully execute (pass rate) for 55% of our models evaluated and 36% more likely to correctly answer the user query on average. We find that troubleshooting documents improve pass rate compared to official documentation by an average of 10%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes