AIAug 11, 2023

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

Zhiwei Liu, Weiran Yao, Jianguo Zhang, Le Xue, Shelby Heinecke, Rithesh Murthy, Yihao Feng, Zeyuan Chen, Juan Carlos Niebles, Devansh Arpit, Ran Xu, Phil Mui

AppleSalesforceStanford

arXiv:2308.05960v131.9107 citationsh-index: 112Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for systematic evaluation and optimization of LAAs for researchers and practitioners in AI, though it is incremental as it builds on existing LAA concepts.

The authors tackled the problem of evaluating and improving LLM-augmented Autonomous Agents (LAAs) by benchmarking different architectures and LLM backbones, and they proposed BOLAA, a strategy to orchestrate multiple LAAs, achieving performance gains in decision-making and multi-step reasoning environments.

The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs). An LAA is able to generate actions with its core LLM and interact with environments, which facilitates the ability to resolve complex tasks by conditioning on past interactions such as observations and actions. Since the investigation of LAA is still very recent, limited explorations are available. Therefore, we provide a comprehensive comparison of LAA in terms of both agent architectures and LLM backbones. Additionally, we propose a new strategy to orchestrate multiple LAAs such that each labor LAA focuses on one type of action, \textit{i.e.} BOLAA, where a controller manages the communication among multiple agents. We conduct simulations on both decision-making and multi-step reasoning environments, which comprehensively justify the capacity of LAAs. Our performance results provide quantitative suggestions for designing LAA architectures and the optimal choice of LLMs, as well as the compatibility of both. We release our implementation code of LAAs to the public at \url{https://github.com/salesforce/BOLAA}.

View on arXiv PDF Code

Similar