LGAIJul 11, 2024

TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representations

arXiv:2407.08464v213 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses the challenge of developing diverse robotic skills without external supervision, but it is incremental as it builds on existing unsupervised GCRL methods.

The paper tackles the problem of limited exploration and sparse rewards in unsupervised goal-conditioned reinforcement learning (GCRL) by proposing TLDR, a method that uses temporal distance-aware representations to select faraway goals and compute intrinsic rewards. Results in six simulated locomotion environments show TLDR significantly outperforms prior methods in achieving a wide range of states.

Unsupervised goal-conditioned reinforcement learning (GCRL) is a promising paradigm for developing diverse robotic skills without external supervision. However, existing unsupervised GCRL methods often struggle to cover a wide range of states in complex environments due to their limited exploration and sparse or noisy rewards for GCRL. To overcome these challenges, we propose a novel unsupervised GCRL method that leverages TemporaL Distance-aware Representations (TLDR). Based on temporal distance, TLDR selects faraway goals to initiate exploration and computes intrinsic exploration rewards and goal-reaching rewards. Specifically, our exploration policy seeks states with large temporal distances (i.e. covering a large state space), while the goal-conditioned policy learns to minimize the temporal distance to the goal (i.e. reaching the goal). Our results in six simulated locomotion environments demonstrate that TLDR significantly outperforms prior unsupervised GCRL methods in achieving a wide range of states.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes