AIDec 10, 2024

PAFFA: Premeditated Actions For Fast Agents

Shambhavi Krishna, Zheng Chen, Yuan Ling, Xiaojiang Huang, Yingjie Li, Fan Yang, Xiang Li

arXiv:2412.07958v21 citationsh-index: 1

Originality Incremental advance

AI Analysis

This work addresses computational inefficiency in AI assistants for web interaction, offering a method to scale inference-time techniques for internet-scale data, though it is incremental in optimizing existing approaches.

The paper tackles the problem of slow and error-prone LLM-driven HTML parsing for web interaction tasks by introducing PAFFA, which pre-computes browser interaction patterns to reduce inference time tokens by 87% while improving step accuracy from 0.50 to 0.57 compared to a baseline.

Modern AI assistants have made significant progress in natural language understanding and tool-use, with emerging efforts to interact with Web interfaces. However, current approaches that heavily rely on repeated LLM-driven HTML parsing are computationally expensive and error-prone, particularly when handling dynamic web interfaces and multi-step tasks. We introduce PAFFA (Premeditated Actions For Fast Agents), a method that makes LLMs faster and more accurate in completing tasks on the internet using a novel inference-time technique that requires no task-specific training. PAFFA constructs an 'Action Library', leveraging the parametric knowledge of the base LLM to pre-compute browser interaction patterns that generalize across tasks. By strategically re-using LLM inference across tasks - either via 'Dist-Map' for task-agnostic identification of key interactive web elements, or 'Unravel' for first-encounter, stateful exploration of novel tasks/sites) - PAFFA drastically reduces inference time tokens by 87% while maintaining robust performance (achieving 0.57 vs. 0.50 step accuracy compared to baseline). Further, Unravel's ability to update its action library based on explorations allows generalization and adaptation to unseen websites. In sum, this work exhibits that LLM reasoning sequences can generalize across prompts, offering a way to scale inference-time techniques for internet-scale data with sublinear token count.

View on arXiv PDF

Similar