Agent Primitives: Reusable Latent Building Blocks for Multi-Agent Systems
This work addresses the challenge of building efficient and reusable multi-agent systems for AI applications, offering a novel approach that reduces manual effort and improves performance, though it is incremental in building on existing MAS architectures.
The paper tackles the problem of task-specific, manually crafted multi-agent systems (MAS) that suffer from limited reusability and instability in long-context interactions, proposing Agent Primitives as reusable latent building blocks; experiments show improvements of 12.0-16.5% in accuracy over single-agent baselines and 3x-4x reductions in token usage and latency compared to text-based MAS.
While existing multi-agent systems (MAS) can handle complex problems by enabling collaboration among multiple agents, they are often highly task-specific, relying on manually crafted agent roles and interaction prompts, which leads to increased architectural complexity and limited reusability across tasks. Moreover, most MAS communicate primarily through natural language, making them vulnerable to error accumulation and instability in long-context, multi-stage interactions within internal agent histories. In this work, we propose \textbf{Agent Primitives}, a set of reusable latent building blocks for LLM-based MAS. Inspired by neural network design, where complex models are built from reusable components, we observe that many existing MAS architectures can be decomposed into a small number of recurring internal computation patterns. Based on this observation, we instantiate three primitives: Review, Voting and Selection, and Planning and Execution. All primitives communicate internally via key-value (KV) cache, which improves both robustness and efficiency by mitigating information degradation across multi-stage interactions. To enable automatic system construction, an Organizer agent selects and composes primitives for each query, guided by a lightweight knowledge pool of previously successful configurations, forming a primitive-based MAS. Experiments show that primitives-based MAS improve average accuracy by 12.0-16.5\% over single-agent baselines, reduce token usage and inference latency by approximately 3$\times$-4$\times$ compared to text-based MAS, while incurring only 1.3$\times$-1.6$\times$ overhead relative to single-agent inference and providing more stable performance across model backbones.