A Song of Ice and Fire: Analyzing Textual Autotelic Agents in ScienceWorld
This work addresses the problem of enhancing autonomous learning in AI agents for researchers in reinforcement learning and AI, but it is incremental as it builds on existing autotelic frameworks with specific optimizations.
The study tackled the challenge of building open-ended agents that autonomously discover diverse behaviors by analyzing autotelic RL agents in the ScienceWorld textual environment, showing that selective social peer feedback, oversampling rare goals in experience replay, and following intermediate competence goal sequences lead to significant performance improvements.
Building open-ended agents that can autonomously discover a diversity of behaviours is one of the long-standing goals of artificial intelligence. This challenge can be studied in the framework of autotelic RL agents, i.e. agents that learn by selecting and pursuing their own goals, self-organizing a learning curriculum. Recent work identified language as a key dimension of autotelic learning, in particular because it enables abstract goal sampling and guidance from social peers for hindsight relabelling. Within this perspective, we study the following open scientific questions: What is the impact of hindsight feedback from a social peer (e.g. selective vs. exhaustive)? How can the agent learn from very rare language goal examples in its experience replay? How can multiple forms of exploration be combined, and take advantage of easier goals as stepping stones to reach harder ones? To address these questions, we use ScienceWorld, a textual environment with rich abstract and combinatorial physics. We show the importance of selectivity from the social peer's feedback; that experience replay needs to over-sample examples of rare goals; and that following self-generated goal sequences where the agent's competence is intermediate leads to significant improvements in final performance.