LG AI SYNov 30, 2023

Predictable Reinforcement Learning Dynamics through Entropy Rate Minimization

Daniel Jarne Ornia, Giannis Delimpaltadakis, Jens Kober, Javier Alonso-Mora

arXiv:2311.18703v53.84 citationsh-index: 7Has Code

Originality Incremental advance

AI Analysis

This addresses safety and predictability challenges in human-robot interaction, though it is an incremental improvement by adapting existing RL methods with a new regularization approach.

The paper tackles the problem of unpredictable behavior in reinforcement learning agents, which can cause safety issues in human-robot interaction, by proposing Predictability-Aware RL (PARL) that maximizes a combination of reward and negative entropy rate to trade off optimality with predictability, resulting in near-optimal policies in human-robot tasks.

In Reinforcement Learning (RL), agents have no incentive to exhibit predictable behaviors, and are often pushed (through e.g. policy entropy regularisation) to randomise their actions in favor of exploration. This often makes it challenging for other agents and humans to predict an agent's behavior, triggering unsafe scenarios (e.g. in human-robot interaction). We propose a novel method to induce predictable behavior in RL agents, termed Predictability-Aware RL (PARL), employing the agent's trajectory entropy rate to quantify predictability. Our method maximizes a linear combination of a standard discounted reward and the negative entropy rate, thus trading off optimality with predictability. We show how the entropy rate can be formally cast as an average reward, how entropy-rate value functions can be estimated from a learned model and incorporate this in policy-gradient algorithms, and demonstrate how this approach produces predictable (near-optimal) policies in tasks inspired by human-robot use-cases.

View on arXiv PDF Code

Similar