SE AIMar 12

Schema First Tool APIs for LLM Agents: A Controlled Study of Tool Misuse, Recovery, and Budgeted Performance

arXiv:2603.1340415.02 citations

AI Analysis

This work addresses tool misuse and reliability issues for LLM agents in software engineering tasks, but it is incremental as it builds on existing interface formalization methods.

The paper studied whether schema-based tool contracts and structured validation diagnostics improve LLM agent reliability under strict interaction budgets, finding that schema conditions reduced interface misuse but not semantic misuse, with task success remaining zero across all conditions.

Tool use has become central to modern LLM agents, yet interface design is rarely isolated as an experimental variable. This paper studies whether schema based tool contracts and structured validation diagnostics improve reliability under strict interaction budgets. We evaluate three conditions that preserve identical tool semantics and information content: free form documentation, JSON Schema specifications, and JSON Schema with structured diagnostics. We implement a deterministic software engineering sandbox with logs, metrics, configurations, and repository tasks, and evaluate a fully crossed pilot with one open local model, three seeds, three interface conditions, and four budgets. We report end task success, interface misuse, execution failures, semantic misuse, recovery behavior, and overhead. In this pilot, success remains zero across conditions, while schema conditions reduce interface misuse but not semantic misuse. The evidence supports a precise interpretation that interface formalization improves contract adherence, but semantic action quality and timeout sensitive tasks remain dominant bottlenecks under constrained local inference.

View on arXiv PDF

Similar