LGJun 10, 2024

Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity

Calarina Muslimani, Bram Grooten, Deepak Ranganatha Sastry Mamillapalli, Mykola Pechenizkiy, Decebal Constantin Mocanu, Matthew E. Taylor

arXiv:2406.06495v32.6

Originality Highly original

AI Analysis

This work addresses the challenge of robust reward learning for autonomous agents in human-centered settings, representing an incremental advancement in preference-based reinforcement learning.

The paper tackled the problem of agents learning from human preferences in noisy environments by proposing R2N, a preference-based reinforcement learning algorithm that uses dynamic sparse training to focus on task-relevant features, resulting in significant performance improvements over existing methods in simulated robotic environments.

To integrate into human-centered environments, autonomous agents must learn from and adapt to humans in their native settings. Preference-based reinforcement learning (PbRL) can enable this by learning reward functions from human preferences. However, humans live in a world full of diverse information, most of which is irrelevant to completing any particular task. It then becomes essential that agents learn to focus on the subset of task-relevant state features. To that end, this work proposes R2N (Robust-to-Noise), the first PbRL algorithm that leverages principles of dynamic sparse training to learn robust reward models that can focus on task-relevant features. In experiments with a simulated teacher, we demonstrate that R2N can adapt the sparse connectivity of its neural networks to focus on task-relevant features, enabling R2N to significantly outperform several sparse training and PbRL algorithms across simulated robotic environments.

View on arXiv PDF

Similar