AILGFeb 11

Pushing Forward Pareto Frontiers of Proactive Agents with Behavioral Agentic Optimization

arXiv:2602.11351v1
Originality Incremental advance
AI Analysis

This work addresses the problem of training user-aligned proactive agents for real-world applications, representing an incremental improvement over existing methods.

The paper tackles the challenge of balancing task performance with user engagement in proactive LLM agents by proposing BAO, an agentic RL framework that combines behavior enhancement and regularization, achieving superior performance over baselines and comparable results to commercial agents on the UserRL benchmark.

Proactive large language model (LLM) agents aim to actively plan, query, and interact over multiple turns, enabling efficient task completion beyond passive instruction following and making them essential for real-world, user-centric applications. Agentic reinforcement learning (RL) has recently emerged as a promising solution for training such agents in multi-turn settings, allowing interaction strategies to be learned from feedback. However, existing pipelines face a critical challenge in balancing task performance with user engagement, as passive agents can not efficiently adapt to users' intentions while overuse of human feedback reduces their satisfaction. To address this trade-off, we propose BAO, an agentic RL framework that combines behavior enhancement to enrich proactive reasoning and information-gathering capabilities with behavior regularization to suppress inefficient or redundant interactions and align agent behavior with user expectations. We evaluate BAO on multiple tasks from the UserRL benchmark suite, and demonstrate that it substantially outperforms proactive agentic RL baselines while achieving comparable or even superior performance to commercial LLM agents, highlighting its effectiveness for training proactive, user-aligned LLM agents in complex multi-turn scenarios. Our website: https://proactive-agentic-rl.github.io/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes