AI LGAug 15, 2024

BCR-DRL: Behavior- and Context-aware Reward for Deep Reinforcement Learning in Human-AI Coordination

Xin Hao, Bahareh Nakisa, Mohmmad Naim Rastgoo, Gaoyang Pang

arXiv:2408.07877v57.33 citationsh-index: 6

Originality Incremental advance

AI Analysis

This work addresses the problem of improving coordination between AI agents and human partners in interactive environments, representing an incremental advancement with specific performance gains.

The paper tackles the challenges of sparse rewards and unpredictable human behaviors in deep reinforcement learning for human-AI coordination by proposing a behavior- and context-aware reward (BCR) method, which increases cumulative sparse rewards by approximately 20% and improves sample efficiency by around 38% compared to state-of-the-art baselines.

Deep reinforcement Learning (DRL) offers a powerful framework for training AI agents to coordinate with human partners. However, DRL faces two critical challenges in human-AI coordination (HAIC): sparse rewards and unpredictable human behaviors. These challenges significantly limit DRL to identify effective coordination policies, due to its impaired capability of optimizing exploration and exploitation. To address these limitations, we propose an innovative behavior- and context-aware reward (BCR) for DRL, which optimizes exploration and exploitation by leveraging human behaviors and contextual information in HAIC. Our BCR consists of two components: (i) A novel dual intrinsic rewarding scheme to enhance exploration. This scheme composes an AI self-motivated intrinsic reward and a human-motivated intrinsic reward, which are designed to increase the capture of sparse rewards by a logarithmic-based strategy; and (ii) A new context-aware weighting mechanism for the designed rewards to improve exploitation. This mechanism helps the AI agent prioritize actions that better coordinate with the human partner by utilizing contextual information that can reflect the evolution of learning. Extensive simulations in the Overcooked environment demonstrate that our approach can increase the cumulative sparse rewards by approximately 20%, and improve the sample efficiency by around 38% compared to state-of-the-art baselines.

View on arXiv PDF

Similar