AILGJan 6, 2025

Co-Activation Graph Analysis of Safety-Verified and Explainable Deep Reinforcement Learning Policies

arXiv:2501.03142v12 citationsh-index: 4ICAART
Originality Synthesis-oriented
AI Analysis

This addresses safety and interpretability issues in deep RL for AI systems, but it appears incremental as it combines existing techniques.

The paper tackled the problem of unsafe and uninterpretable deep reinforcement learning policies by combining RL policy model checking with co-activation graph analysis to gain insight into safe decision-making, demonstrating applicability in various experiments.

Deep reinforcement learning (RL) policies can demonstrate unsafe behaviors and are challenging to interpret. To address these challenges, we combine RL policy model checking--a technique for determining whether RL policies exhibit unsafe behaviors--with co-activation graph analysis--a method that maps neural network inner workings by analyzing neuron activation patterns--to gain insight into the safe RL policy's sequential decision-making. This combination lets us interpret the RL policy's inner workings for safe decision-making. We demonstrate its applicability in various experiments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes