TW-CRL: Time-Weighted Contrastive Reward Learning for Efficient Inverse Reinforcement Learning
This work addresses challenges in reinforcement learning for episodic tasks with sparse rewards and hidden failures, offering a solution for domains like robotics and navigation, though it appears incremental as it builds on existing inverse reinforcement learning methods.
The paper tackles the problem of sparse rewards and hidden trap states in episodic reinforcement learning tasks by proposing TW-CRL, an inverse reinforcement learning framework that uses time-weighted contrastive reward learning from both successful and failed demonstrations, resulting in improved efficiency and robustness over state-of-the-art methods in navigation and robotic manipulation benchmarks.
Episodic tasks in Reinforcement Learning (RL) often pose challenges due to sparse reward signals and high-dimensional state spaces, which hinder efficient learning. Additionally, these tasks often feature hidden "trap states" -- irreversible failures that prevent task completion but do not provide explicit negative rewards to guide agents away from repeated errors. To address these issues, we propose Time-Weighted Contrastive Reward Learning (TW-CRL), an Inverse Reinforcement Learning (IRL) framework that leverages both successful and failed demonstrations. By incorporating temporal information, TW-CRL learns a dense reward function that identifies critical states associated with success or failure. This approach not only enables agents to avoid trap states but also encourages meaningful exploration beyond simple imitation of expert trajectories. Empirical evaluations on navigation tasks and robotic manipulation benchmarks demonstrate that TW-CRL surpasses state-of-the-art methods, achieving improved efficiency and robustness.