AILGFeb 2

FlowSteer: Interactive Agentic Workflow Orchestration via End-to-End Reinforcement Learning

arXiv:2602.01664v11 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the challenge of automating workflow orchestration for users dealing with agentic systems, though it appears incremental as it builds on existing reinforcement learning and workflow methods.

The paper tackles the problem of high manual cost and sparse rewards in agentic workflow orchestration by proposing FlowSteer, an end-to-end reinforcement learning framework that automates workflow orchestration through multi-turn interaction, and experimental results on twelve datasets show it significantly outperforms baselines.

In recent years, a variety of powerful agentic workflows have been applied to solve a wide range of human problems. However, existing workflow orchestration still faces key challenges, including high manual cost, reliance on specific operators/large language models (LLMs), and sparse reward signals. To address these challenges, we propose FlowSteer, an end-to-end reinforcement learning framework that takes a lightweight policy model as the agent and an executable canvas environment, automating workflow orchestration through multi-turn interaction. In this process, the policy model analyzes execution states and selects editing actions, while the canvas executes operators and returns feedback for iterative refinement. Moreover, FlowSteer provides a plug-and-play framework that supports diverse operator libraries and interchangeable LLM backends. To effectively train this interaction paradigm, we propose Canvas Workflow Relative Policy Optimization (CWRPO), which introduces diversity-constrained rewards with conditional release to stabilize learning and suppress shortcut behaviors. Experimental results on twelve datasets show that FlowSteer significantly outperforms baselines across various tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes