LG AIApr 23, 2022

Discovering Intrinsic Reward with Contrastive Random Walk

Zixuan Pan, Zihao Wei, Yidong Huang, Aditya Gupta

arXiv:2204.10976v11.8h-index: 5Has Code

Originality Incremental advance

AI Analysis

This addresses faster policy convergence in non-tabular sparse reward scenarios for reinforcement learning practitioners, though it appears incremental as it builds on existing curiosity methods.

The paper tackles the problem of slow convergence in sparse reward reinforcement learning by using Contrastive Random Walk as a curiosity method to provide intrinsic rewards, achieving the highest reward within the same iterations compared to other methods and demonstrating robustness across different random initializations.

The aim of this paper is to demonstrate the efficacy of using Contrastive Random Walk as a curiosity method to achieve faster convergence to the optimal policy.Contrastive Random Walk defines the transition matrix of a random walk with the help of neural networks. It learns a meaningful state representation with a closed loop. The loss of Contrastive Random Walk serves as an intrinsic reward and is added to the environment reward. Our method works well in non-tabular sparse reward scenarios, in the sense that our method receives the highest reward within the same iterations compared to other methods. Meanwhile, Contrastive Random Walk is more robust. The performance doesn't change much with different random initialization of environments. We also find that adaptive restart and appropriate temperature are crucial to the performance of Contrastive Random Walk.

View on arXiv PDF Code

Similar