LG AI CVJun 2, 2024

FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning

Yuwei Fu, Haichao Zhang, Di Wu, Wei Xu, Benoit Boulet

arXiv:2406.00645v222.031 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses a specific issue in reinforcement learning for sparse-reward tasks with textual descriptions, representing an incremental improvement.

The paper tackles the problem of reward misalignment when using pre-trained visual-language models as rewards in sparse-reward reinforcement learning tasks, and introduces FuRL, a lightweight fine-tuning method that improves SAC/DrQ baseline agents on Meta-world benchmark tasks.

In this work, we investigate how to leverage pre-trained visual-language models (VLM) for online Reinforcement Learning (RL). In particular, we focus on sparse reward tasks with pre-defined textual task descriptions. We first identify the problem of reward misalignment when applying VLM as a reward in RL tasks. To address this issue, we introduce a lightweight fine-tuning method, named Fuzzy VLM reward-aided RL (FuRL), based on reward alignment and relay RL. Specifically, we enhance the performance of SAC/DrQ baseline agents on sparse reward tasks by fine-tuning VLM representations and using relay RL to avoid local minima. Extensive experiments on the Meta-world benchmark tasks demonstrate the efficacy of the proposed method. Code is available at: https://github.com/fuyw/FuRL.

View on arXiv PDF Code

Similar