A Survey on Complex Tasks for Goal-Directed Interactive Agents
This survey helps researchers benchmark and improve interactive agents by organizing evaluation tasks, but it is incremental as it synthesizes existing work without new methods or results.
The paper compiles and structures tasks and environments for evaluating goal-directed interactive agents to understand current challenges, providing an up-to-date resource on a project website.
Goal-directed interactive agents, which autonomously complete tasks through interactions with their environment, can assist humans in various domains of their daily lives. Recent advances in large language models (LLMs) led to a surge of new, more and more challenging tasks to evaluate such agents. To properly contextualize performance across these tasks, it is imperative to understand the different challenges they pose to agents. To this end, this survey compiles relevant tasks and environments for evaluating goal-directed interactive agents, structuring them along dimensions relevant for understanding current obstacles. An up-to-date compilation of relevant resources can be found on our project website: https://coli-saar.github.io/interactive-agents.