LGFeb 7, 2024

Code as Reward: Empowering Reinforcement Learning with VLMs

MILA
arXiv:2402.04764v132 citationsh-index: 13ICML
Originality Incremental advance
AI Analysis

This addresses the bottleneck of slow RL training due to frequent VLM queries, offering a domain-specific improvement for RL applications.

The paper tackles the computational expense of using Vision-Language Models (VLMs) for reward feedback in reinforcement learning by proposing VLM-CaR, a framework that generates dense reward functions via code generation, resulting in accurate rewards across diverse environments and more effective policy training than sparse rewards.

Pre-trained Vision-Language Models (VLMs) are able to understand visual concepts, describe and decompose complex tasks into sub-tasks, and provide feedback on task completion. In this paper, we aim to leverage these capabilities to support the training of reinforcement learning (RL) agents. In principle, VLMs are well suited for this purpose, as they can naturally analyze image-based observations and provide feedback (reward) on learning progress. However, inference in VLMs is computationally expensive, so querying them frequently to compute rewards would significantly slowdown the training of an RL agent. To address this challenge, we propose a framework named Code as Reward (VLM-CaR). VLM-CaR produces dense reward functions from VLMs through code generation, thereby significantly reducing the computational burden of querying the VLM directly. We show that the dense rewards generated through our approach are very accurate across a diverse set of discrete and continuous environments, and can be more effective in training RL policies than the original sparse environment rewards.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes