Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems
This work addresses the need for better design patterns in AI agents, particularly for web navigation, and synthesizes insights into broader principles, though it is incremental in building on prior web agents.
The paper tackles the problem of designing capable web agents by introducing Agent-E, which achieves state-of-the-art performance on the WebVoyager benchmark, beating other agents by 10-30% in most categories. It then generalizes these findings into foundational design principles for agentic systems.
AI Agents are changing the way work gets done, both in consumer and enterprise domains. However, the design patterns and architectures to build highly capable agents or multi-agent systems are still developing, and the understanding of the implication of various design choices and algorithms is still evolving. In this paper, we present our work on building a novel web agent, Agent-E \footnote{Our code is available at \url{https://github.com/EmergenceAI/Agent-E}}. Agent-E introduces numerous architectural improvements over prior state-of-the-art web agents such as hierarchical architecture, flexible DOM distillation and denoising method, and the concept of \textit{change observation} to guide the agent towards more accurate performance. We first present the results of an evaluation of Agent-E on WebVoyager benchmark dataset and show that Agent-E beats other SOTA text and multi-modal web agents on this benchmark in most categories by 10-30\%. We then synthesize our learnings from the development of Agent-E into general design principles for developing agentic systems. These include the use of domain-specific primitive skills, the importance of distillation and de-noising of environmental observations, the advantages of a hierarchical architecture, and the role of agentic self-improvement to enhance agent efficiency and efficacy as the agent gathers experience.