Safety-Oriented Pruning and Interpretation of Reinforcement Learning Policies
This addresses safety and interpretability issues in reinforcement learning for applications requiring reliable AI, but it appears incremental as it builds on existing pruning and model checking techniques.
The paper tackles the problem of pruning neural networks in safe reinforcement learning policies, which risks removing vital parameters, by introducing VERINTER, a method that combines pruning with model checking to ensure safety and interpretability, maintaining safety in pruned policies and enhancing understanding of safety dynamics.
Pruning neural networks (NNs) can streamline them but risks removing vital parameters from safe reinforcement learning (RL) policies. We introduce an interpretable RL method called VERINTER, which combines NN pruning with model checking to ensure interpretable RL safety. VERINTER exactly quantifies the effects of pruning and the impact of neural connections on complex safety properties by analyzing changes in safety measurements. This method maintains safety in pruned RL policies and enhances understanding of their safety dynamics, which has proven effective in multiple RL settings.