LGAIAug 21, 2024

Using Part-based Representations for Explainable Deep Reinforcement Learning

arXiv:2408.11455v21 citationsh-index: 33
Originality Incremental advance
AI Analysis

This work addresses interpretability issues in deep reinforcement learning for researchers and practitioners, though it is incremental as it builds on existing part-based methods.

The paper tackled the challenge of training part-based representations in deep reinforcement learning by proposing a non-negative training approach for actor models, which improved interpretability and demonstrated effectiveness on the Cartpole benchmark.

Utilizing deep learning models to learn part-based representations holds significant potential for interpretable-by-design approaches, as these models incorporate latent causes obtained from feature representations through simple addition. However, training a part-based learning model presents challenges, particularly in enforcing non-negative constraints on the model's parameters, which can result in training difficulties such as instability and convergence issues. Moreover, applying such approaches in Deep Reinforcement Learning (RL) is even more demanding due to the inherent instabilities that impact many optimization methods. In this paper, we propose a non-negative training approach for actor models in RL, enabling the extraction of part-based representations that enhance interpretability while adhering to non-negative constraints. To this end, we employ a non-negative initialization technique, as well as a modified sign-preserving training method, which can ensure better gradient flow compared to existing approaches. We demonstrate the effectiveness of the proposed approach using the well-known Cartpole benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes