LGAIMay 20, 2022

Learning Dense Reward with Temporal Variant Self-Supervision

arXiv:2205.10431v21 citationsh-index: 9
Originality Incremental advance
AI Analysis

This work addresses the challenge of reward specification in real-world robotic applications like contact-rich manipulation, offering an incremental improvement over prior methods.

The paper tackles the problem of learning dense rewards for complex robotic manipulation tasks, where explicit reward functions are unavailable, by proposing a more efficient and robust sampling and learning method; preliminary results show it leads to faster convergence than baselines in joint-assembly and door-opening tasks.

Rewards play an essential role in reinforcement learning. In contrast to rule-based game environments with well-defined reward functions, complex real-world robotic applications, such as contact-rich manipulation, lack explicit and informative descriptions that can directly be used as a reward. Previous effort has shown that it is possible to algorithmically extract dense rewards directly from multimodal observations. In this paper, we aim to extend this effort by proposing a more efficient and robust way of sampling and learning. In particular, our sampling approach utilizes temporal variance to simulate the fluctuating state and action distribution of a manipulation task. We then proposed a network architecture for self-supervised learning to better incorporate temporal information in latent representations. We tested our approach in two experimental setups, namely joint-assembly and door-opening. Preliminary results show that our approach is effective and efficient in learning dense rewards, and the learned rewards lead to faster convergence than baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes