Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants
This work addresses a bottleneck in proactive assistant development for researchers and developers by providing a simulation framework and benchmark.
The paper tackles the problem of developing proactive agents by addressing the lack of realistic user simulation frameworks, introducing Pare, which models applications as finite state machines to enable active user simulation, and Pare-Bench, a benchmark of 143 tasks to evaluate these agents.
Proactive agents that anticipate user needs and autonomously execute tasks hold great promise as digital assistants, yet the lack of realistic user simulation frameworks hinders their development. Existing approaches model apps as flat tool-calling APIs, failing to capture the stateful and sequential nature of user interaction in digital environments and making realistic user simulation infeasible. We introduce Proactive Agent Research Environment (Pare), a framework for building and evaluating proactive agents in digital environments. Pare models applications as finite state machines with stateful navigation and state-dependent action space for the user simulator, enabling active user simulation. Building on this foundation, we present Pare-Bench, a benchmark of 143 diverse tasks spanning communication, productivity, scheduling, and lifestyle apps, designed to test context observation, goal inference, intervention timing, and multi-app orchestration.