AIFeb 20, 2024

XRL-Bench: A Benchmark for Evaluating and Comparing Explainable Reinforcement Learning Techniques

Yu Xiong, Zhipeng Hu, Ye Huang, Runze Wu, Kai Guan, Xingchen Fang, Ji Jiang, Tianze Zhou, Yujing Hu, Haoyu Liu, Tangjie Lyu, Changjie Fan

arXiv:2402.12685v14.25 citationsh-index: 16Has CodeKDD

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of evaluating explainable AI in reinforcement learning for researchers and practitioners, though it is incremental as it builds on existing XRL methods by providing a benchmark.

The paper tackles the lack of a unified evaluation framework for explainable reinforcement learning (XRL) methods by introducing XRL-Bench, a standardized benchmark for assessing and comparing state-explaining techniques, and demonstrates its utility with a new method called TabularSHAP in real-world online gaming services.

Reinforcement Learning (RL) has demonstrated substantial potential across diverse fields, yet understanding its decision-making process, especially in real-world scenarios where rationality and safety are paramount, is an ongoing challenge. This paper delves in to Explainable RL (XRL), a subfield of Explainable AI (XAI) aimed at unravelling the complexities of RL models. Our focus rests on state-explaining techniques, a crucial subset within XRL methods, as they reveal the underlying factors influencing an agent's actions at any given time. Despite their significant role, the lack of a unified evaluation framework hinders assessment of their accuracy and effectiveness. To address this, we introduce XRL-Bench, a unified standardized benchmark tailored for the evaluation and comparison of XRL methods, encompassing three main modules: standard RL environments, explainers based on state importance, and standard evaluators. XRL-Bench supports both tabular and image data for state explanation. We also propose TabularSHAP, an innovative and competitive XRL method. We demonstrate the practical utility of TabularSHAP in real-world online gaming services and offer an open-source benchmark platform for the straightforward implementation and evaluation of XRL methods. Our contributions facilitate the continued progression of XRL technology.

View on arXiv PDF Code

Similar