AICLJun 17, 2025

OAgents: An Empirical Study of Building Effective Agents

arXiv:2506.15741v222 citationsh-index: 16Has CodeEMNLP
Originality Incremental advance
AI Analysis

This work addresses the problem of inconsistent evaluation and unclear design impacts for researchers in Agentic AI, though it is incremental in improving existing practices.

The paper tackled the lack of standardization and reproducibility in agent research by conducting a systematic empirical study on benchmarks like GAIA and BrowseComp, revealing crucial design choices and introducing OAgents, a new framework that achieves state-of-the-art performance among open-source projects.

Recently, Agentic AI has become an increasingly popular research field. However, we argue that current agent research practices lack standardization and scientific rigor, making it hard to conduct fair comparisons among methods. As a result, it is still unclear how different design choices in agent frameworks affect effectiveness, and measuring their progress remains challenging. In this work, we conduct a systematic empirical study on GAIA benchmark and BrowseComp to examine the impact of popular design choices in key agent components in a fair and rigorous manner. We find that the lack of a standard evaluation protocol makes previous works, even open-sourced ones, non-reproducible, with significant variance between random runs. Therefore, we introduce a more robust evaluation protocol to stabilize comparisons. Our study reveals which components and designs are crucial for effective agents, while others are redundant, despite seeming logical. Based on our findings, we build and open-source OAgents, a new foundation agent framework that achieves state-of-the-art performance among open-source projects. OAgents offers a modular design for various agent components, promoting future research in Agentic AI.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes