AIMar 9

PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation Agents

arXiv:2603.08013v15 citations
Predicted impact top 4% in AI · last 90 daysOriginality Highly original
AI Analysis

This work addresses the critical need for proactive AI assistants that can anticipate user intentions from visual inputs, benefiting users by offering timely recommendations without explicit prompting. It's an initial step towards more robust and intelligent personal assistants.

The paper introduces PIRA-Bench, a new benchmark for evaluating multimodal large language models (MLLMs) in proactive GUI environments. This benchmark addresses the challenge of anticipating user intentions from continuous visual inputs, moving beyond reactive GUI agents. It features complex, non-linear trajectories with interleaved intents and noisy segments, aiming to foster the development of proactive AI assistants.

Current Graphical User Interface (GUI) agents operate primarily under a reactive paradigm: a user must provide an explicit instruction for the agent to execute a task. However, an intelligent AI assistant should be proactive, which is capable of anticipating user intentions directly from continuous visual inputs, such as mobile or desktop screenshots, and offering timely recommendations without explicit user prompting. Transitioning to this proactive paradigm presents significant challenges. Real-world screen activity is rarely linear; it consists of long-horizon trajectories fraught with noisy browsing, meaningless actions, and multithreaded task-switching. To address this gap, we introduce PIRA-Bench (Proactive Intent Recommendation Agent Benchmark), a novel benchmark for evaluating multimodal large language models (MLLMs) on continuous, weakly-supervised visual inputs. Unlike reactive datasets, PIRA-Bench features complex trajectories with multiple interleaved intents and noisy segments with various user profile contexts, challenging agents to detect actionable events while fitting to user preferences. Furthermore, we propose the PIRF baseline, a memory-aware, state-tracking framework that empowers general MLLMs to manage multiple task threads and handle misleading visual inputs. PIRA-Bench serves as an initial step toward robust and proactive GUI-based personal assistants.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes