Automated Reward Design for Gran Turismo
This work addresses the problem of automating reward design for RL agents in racing games, which could have practical applications in real-world scenarios, though it appears incremental by building on existing foundation models.
The paper tackles the challenge of designing reward functions for reinforcement learning agents in complex environments like autonomous racing by using foundation models to search over reward spaces based on text instructions, resulting in agents competitive with a champion-level RL racing agent and capable of generating novel behaviors.
When designing reinforcement learning (RL) agents, a designer communicates the desired agent behavior through the definition of reward functions - numerical feedback given to the agent as reward or punishment for its actions. However, mapping desired behaviors to reward functions can be a difficult process, especially in complex environments such as autonomous racing. In this paper, we demonstrate how current foundation models can effectively search over a space of reward functions to produce desirable RL agents for the Gran Turismo 7 racing game, given only text-based instructions. Through a combination of LLM-based reward generation, VLM preference-based evaluation, and human feedback we demonstrate how our system can be used to produce racing agents competitive with GT Sophy, a champion-level RL racing agent, as well as generate novel behaviors, paving the way for practical automated reward design in real world applications.