VIRAL: Vision-grounded Integration for Reward design And Learning
This addresses the problem of reward misalignment in reinforcement learning for AI developers, though it is incremental as it builds on existing LLM-based reward generation methods.
The paper tackles the challenge of aligning AI agents with human intent by introducing VIRAL, a pipeline that uses multi-modal LLMs to autonomously generate and refine reward functions for reinforcement learning, which accelerates learning of new behaviors in five Gymnasium environments.
The alignment between humans and machines is a critical challenge in artificial intelligence today. Reinforcement learning, which aims to maximize a reward function, is particularly vulnerable to the risks associated with poorly designed reward functions. Recent advancements has shown that Large Language Models (LLMs) for reward generation can outperform human performance in this context. We introduce VIRAL, a pipeline for generating and refining reward functions through the use of multi-modal LLMs. VIRAL autonomously creates and interactively improves reward functions based on a given environment and a goal prompt or annotated image. The refinement process can incorporate human feedback or be guided by a description generated by a video LLM, which explains the agent's policy in video form. We evaluated VIRAL in five Gymnasium environments, demonstrating that it accelerates the learning of new behaviors while ensuring improved alignment with user intent. The source-code and demo video are available at: https://github.com/VIRAL-UCBL1/VIRAL and https://youtu.be/Hqo82CxVT38.