CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution

arXiv:2605.1329581.1

Predicted impact top 66% in CL · last 90 daysOriginality Highly original

AI Analysis

This work addresses the challenge of automating configuration in multi-agent systems, a key bottleneck for practitioners deploying such systems.

CANTANTE introduces a framework for optimizing LLM-based multi-agent systems by decomposing system-level rewards into per-agent update signals via contrastive credit attribution. It achieves the best average rank across programming, math, and QA benchmarks, improving over the strongest baseline by +18.9 points on MBPP and +12.5 points on GSM8K.

LLM-based multi-agent systems have demonstrated strong performance across complex real-world tasks, such as software engineering, predictive modeling, and retrieval-augmented generation. Yet automating their configuration remains a structural challenge, as scores are available only at the system level, whereas the parameters governing agent behavior are local. We argue that optimizing these systems is fundamentally a credit-assignment problem. We therefore introduce CANTANTE, a framework that decomposes system-level rewards into per-agent update signals by contrasting rollouts of multiple joint configurations on the same query. We instantiate it for prompt optimization, treating agent prompts as learnable system parameters. We evaluate CANTANTE against GEPA and MIPROv2 on programming (MBPP), mathematical reasoning (GSM8K), and multi-hop question answering (HotpotQA). Across these benchmarks, CANTANTE achieves the best average rank among all evaluated optimizers and consistently outperforms unoptimized prompts. It improves over the strongest baseline by +18.9 percentage points on MBPP and +12.5 percentage points on GSM8K, while incurring a lower inference cost. It remains within one standard deviation of the strongest baseline on HotpotQA. Crucially, our credit correlation analysis confirms that the attributer produces meaningful per-agent signals rather than echoing the global system score.

View on arXiv PDF

Similar