LGJul 31, 2024

Black Box Meta-Learning Intrinsic Rewards

Octavio Pappalardo, Rodrigo Ramele, Juan Miguel Santos

arXiv:2407.21546v31 citationsh-index: 3Has Code

Originality Incremental advance

AI Analysis

This work addresses data efficiency and generalization issues in RL for applications like robotics, but it is incremental as it builds on existing meta-learning and intrinsic reward methods.

The paper tackles the challenge of improving reinforcement learning in sparse-reward environments by meta-learning intrinsic rewards, treating policy updates as black boxes to bypass meta-gradient computation, and demonstrates effectiveness in continuous control tasks with sparse rewards.

The broader application of reinforcement learning (RL) is limited by challenges including data efficiency, generalization capability, and ability to learn in sparse-reward environments. Meta-learning has emerged as a promising approach to address these issues by optimizing components of the learning algorithm to meet desired characteristics. Additionally, a different line of work has extensively studied the use of intrinsic rewards to enhance the exploration capabilities of algorithms. This work investigates how meta-learning can improve the training signal received by RL agents. We introduce a method to learn intrinsic rewards within a reinforcement learning framework that bypasses the typical computation of meta-gradients through an optimization process by treating policy updates as black boxes. We validate our approach against training with extrinsic rewards, demonstrating its effectiveness, and additionally compare it to the use of a meta-learned advantage function. Experiments are carried out on distributions of continuous control tasks with both parametric and non-parametric variations. Furthermore, only sparse rewards are used during evaluation. Code is available at: https: //github.com/Octavio-Pappalardo/Meta-learning-rewards

View on arXiv PDF Code

Similar