LGDec 10, 2022

Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking

arXiv:2212.05337v17 citationsh-index: 31
Originality Incremental advance
AI Analysis

This work addresses the specific issue of targeted adversarial attacks on RL policies for researchers and practitioners in AI safety, representing an incremental advance by focusing on temporal logic properties rather than general reward degradation.

The paper tackles the problem of adversarial attacks on deep reinforcement learning policies by introducing a metric to measure their impact on temporal logic properties and a model checking method for verifying policy robustness, achieving empirical confirmation of the metric's quality and concise robustness assessment.

Deep Reinforcement Learning (RL) agents are susceptible to adversarial noise in their observations that can mislead their policies and decrease their performance. However, an adversary may be interested not only in decreasing the reward, but also in modifying specific temporal logic properties of the policy. This paper presents a metric that measures the exact impact of adversarial attacks against such properties. We use this metric to craft optimal adversarial attacks. Furthermore, we introduce a model checking method that allows us to verify the robustness of RL policies against adversarial attacks. Our empirical analysis confirms (1) the quality of our metric to craft adversarial attacks against temporal logic properties, and (2) that we are able to concisely assess a system's robustness against attacks.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes