ROLGMar 16, 2024

ViSaRL: Visual Reinforcement Learning Guided by Human Saliency

UW
arXiv:2403.10940v320 citationsh-index: 27IROS
Originality Highly original
AI Analysis

This addresses the problem of inefficient robot training from visual data for robotics and AI researchers, representing a novel method for a known bottleneck.

The paper tackles the sample inefficiency of training robots from pixel input in reinforcement learning by introducing ViSaRL, which uses human visual saliency to guide learning, resulting in nearly doubled success rates on real-robot tasks compared to baselines.

Training robots to perform complex control tasks from high-dimensional pixel input using reinforcement learning (RL) is sample-inefficient, because image observations are comprised primarily of task-irrelevant information. By contrast, humans are able to visually attend to task-relevant objects and areas. Based on this insight, we introduce Visual Saliency-Guided Reinforcement Learning (ViSaRL). Using ViSaRL to learn visual representations significantly improves the success rate, sample efficiency, and generalization of an RL agent on diverse tasks including DeepMind Control benchmark, robot manipulation in simulation and on a real robot. We present approaches for incorporating saliency into both CNN and Transformer-based encoders. We show that visual representations learned using ViSaRL are robust to various sources of visual perturbations including perceptual noise and scene variations. ViSaRL nearly doubles success rate on the real-robot tasks compared to the baseline which does not use saliency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes