LGDec 4, 2025

SHAP-Guided Kernel Actor-Critic for Explainable Reinforcement Learning

Na Li, Hangguan Shan, Wei Ni, Wenjie Zhang, Xinyu Li

arXiv:2512.05291v24.1h-index: 30

Originality Incremental advance

AI Analysis

This work addresses the need for explainable reinforcement learning in domains like robotics and control, though it appears incremental as it builds on existing actor-critic and SHAP methods.

The paper tackles the problem of limited interpretability in actor-critic reinforcement learning methods by proposing RSA2C, an attribution-aware, kernelized algorithm that uses SHAP-based state attributions to guide training, achieving improved efficiency, stability, and interpretability in continuous-control environments.

Actor-critic (AC) methods are a cornerstone of reinforcement learning (RL) but offer limited interpretability. Current explainable RL methods seldom use state attributions to assist training. Rather, they treat all state features equally, thereby neglecting the heterogeneous impacts of individual state dimensions on the reward. We propose RKHS-SHAP-based Advanced Actor-Critic (RSA2C), an attribution-aware, kernelized, two-timescale AC algorithm, including Actor, Value Critic, and Advantage Critic. The Actor is instantiated in a vector-valued reproducing kernel Hilbert space (RKHS) with a Mahalanobis-weighted operator-valued kernel, while the Value Critic and Advantage Critic reside in scalar RKHSs. These RKHS-enhanced components use sparsified dictionaries: the Value Critic maintains its own dictionary, while the Actor and Advantage Critic share one. State attributions, computed from the Value Critic via RKHS-SHAP (kernel mean embedding for on-manifold and conditional mean embedding for off-manifold expectations), are converted into Mahalanobis-gated weights that modulate Actor gradients and Advantage Critic targets. We derive a global, non-asymptotic convergence bound under state perturbations, showing stability through the perturbation-error term and efficiency through the convergence-error term. Empirical results on three continuous-control environments show that RSA2C achieves efficiency, stability, and interpretability.

View on arXiv PDF

Similar