AIJul 18, 2025

Adaptive Multi-Agent Reasoning via Automated Workflow Generation

Humza Sami, Mubashir ul Islam, Pierre-Emmanuel Gaillardon, Valerio Tenace

arXiv:2507.14393v15.81 citationsh-index: 8Has Code

Originality Incremental advance

AI Analysis

This addresses a critical limitation in AI reasoning systems for users needing robust problem-solving, though it appears incremental as an enhanced iteration of an existing framework.

The paper tackles the problem of poor generalization in Large Reasoning Models (LRMs) by introducing Nexus Architect, a multi-agent system with automated workflow generation, which achieves up to a 66% increase in pass rate over state-of-the-art models like Gemini 2.5 Flash Preview.

The rise of Large Reasoning Models (LRMs) promises a significant leap forward in language model capabilities, aiming to tackle increasingly sophisticated tasks with unprecedented efficiency and accuracy. However, despite their impressive performance, recent studies have highlighted how current reasoning models frequently fail to generalize to novel, unseen problems, often resorting to memorized solutions rather than genuine inferential reasoning. Such behavior underscores a critical limitation in modern LRMs, i.e., their tendency toward overfitting, which in turn results in poor generalization in problem-solving capabilities. In this paper, we introduce Nexus Architect, an enhanced iteration of our multi-agent system framework, Nexus, equipped with a novel automated workflow synthesis mechanism. Given a user's prompt and a small set of representative examples, the Architect autonomously generates a tailored reasoning workflow by selecting suitable strategies, tool integrations, and adversarial techniques for a specific problem class. Furthermore, the Architect includes an iterative prompt refinement mechanism that fine-tunes agents' system prompts to maximize performance and improve the generalization capabilities of the system. We empirically evaluate Nexus Architect by employing an off-the-shelf, non-reasoning model on a custom dataset of challenging logical questions and compare its performance against state-of-the-art LRMs. Results show that Nexus Architect consistently outperforms existing solutions, achieving up to a 66% increase in pass rate over Gemini 2.5 Flash Preview, nearly 2.5$\times$ against Claude Sonnet 4 and DeepSeek-R1, and over 3$\times$ w.r.t. Llama 4 Scout.

View on arXiv PDF Code

Similar