AI CLJul 19, 2025

Routine: A Structural Planning Framework for LLM Agent System in Enterprise

Guancheng Zeng, Xueyi Chen, Jiawang Hu, Shaohua Qi, Yaxuan Mao, Zhantao Wang, Yifan Nie, Shuang Li, Qiuyang Feng, Pengxu Qiu, Yujia Wang, Wenqiang Han

arXiv:2507.14447v214.77 citationsh-index: 3

Originality Incremental advance

AI Analysis

It addresses the challenge of deploying stable agent systems in enterprise settings, offering a practical solution for domain-specific process knowledge, though it appears incremental as it builds on existing agent planning methods.

The paper tackles the problem of disorganized plans and poor execution stability in LLM agent systems for enterprise environments by introducing Routine, a structural planning framework, which significantly increased execution accuracy from 41.1% to 96.3% for GPT-4o and from 32.6% to 83.3% for Qwen3-14B in real-world evaluations.

The deployment of agent systems in an enterprise environment is often hindered by several challenges: common models lack domain-specific process knowledge, leading to disorganized plans, missing key tools, and poor execution stability. To address this, this paper introduces Routine, a multi-step agent planning framework designed with a clear structure, explicit instructions, and seamless parameter passing to guide the agent's execution module in performing multi-step tool-calling tasks with high stability. In evaluations conducted within a real-world enterprise scenario, Routine significantly increases the execution accuracy in model tool calls, increasing the performance of GPT-4o from 41.1% to 96.3%, and Qwen3-14B from 32.6% to 83.3%. We further constructed a Routine-following training dataset and fine-tuned Qwen3-14B, resulting in an accuracy increase to 88.2% on scenario-specific evaluations, indicating improved adherence to execution plans. In addition, we employed Routine-based distillation to create a scenario-specific, multi-step tool-calling dataset. Fine-tuning on this distilled dataset raised the model's accuracy to 95.5%, approaching GPT-4o's performance. These results highlight Routine's effectiveness in distilling domain-specific tool-usage patterns and enhancing model adaptability to new scenarios. Our experimental results demonstrate that Routine provides a practical and accessible approach to building stable agent workflows, accelerating the deployment and adoption of agent systems in enterprise environments, and advancing the technical vision of AI for Process.

View on arXiv PDF

Similar