LG AINov 27, 2023

Utilizing Explainability Techniques for Reinforcement Learning Model Assurance

Alexander Tapley, Kyle Gatesman, Luis Robaina, Brett Bissey, Joseph Weissman

arXiv:2311.15838v13.83 citationsh-index: 2Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the problem of ensuring reliability and trust in reinforcement learning systems for real-world applications, though it is incremental as it builds on existing explainability methods.

The paper introduces ARLIN, an open-source toolkit that uses explainability techniques to identify vulnerabilities in trained deep reinforcement learning models before deployment, demonstrated through visualizations and analysis of a public model.

Explainable Reinforcement Learning (XRL) can provide transparency into the decision-making process of a Deep Reinforcement Learning (DRL) model and increase user trust and adoption in real-world use cases. By utilizing XRL techniques, researchers can identify potential vulnerabilities within a trained DRL model prior to deployment, therefore limiting the potential for mission failure or mistakes by the system. This paper introduces the ARLIN (Assured RL Model Interrogation) Toolkit, an open-source Python library that identifies potential vulnerabilities and critical points within trained DRL models through detailed, human-interpretable explainability outputs. To illustrate ARLIN's effectiveness, we provide explainability visualizations and vulnerability analysis for a publicly available DRL model. The open-source code repository is available for download at https://github.com/mitre/arlin.

View on arXiv PDF Code

Similar