SYMar 11, 2022
Reinforcement Learning for Linear Quadratic Control is Vulnerable Under Cost ManipulationYunhan Huang, Quanyan Zhu
In this work, we study the deception of a Linear-Quadratic-Gaussian (LQG) agent by manipulating the cost signals. We show that a small falsification of the cost parameters will only lead to a bounded change in the optimal policy. The bound is linear on the amount of falsification the attacker can apply to the cost parameters. We propose an attack model where the attacker aims to mislead the agent into learning a `nefarious' policy by intentionally falsifying the cost parameters. We formulate the attack's problem as a convex optimization problem and develop necessary and sufficient conditions to check the achievability of the attacker's goal. We showcase the adversarial manipulation on two types of LQG learners: the batch RL learner and the other is the adaptive dynamic programming (ADP) learner. Our results demonstrate that with only 2.296% of falsification on the cost data, the attacker misleads the batch RL into learning the 'nefarious' policy that leads the vehicle to a dangerous position. The attacker can also gradually trick the ADP learner into learning the same `nefarious' policy by consistently feeding the learner a falsified cost signal that stays close to the actual cost signal. The paper aims to raise people's awareness of the security threats faced by RL-enabled control systems.
CRJul 2, 2021
Reinforcement Learning for Feedback-Enabled Cyber ResilienceYunhan Huang, Linan Huang, Quanyan Zhu
Digitization and remote connectivity have enlarged the attack surface and made cyber systems more vulnerable. As attackers become increasingly sophisticated and resourceful, mere reliance on traditional cyber protection, such as intrusion detection, firewalls, and encryption, is insufficient to secure the cyber systems. Cyber resilience provides a new security paradigm that complements inadequate protection with resilience mechanisms. A Cyber-Resilient Mechanism (CRM) adapts to the known or zero-day threats and uncertainties in real-time and strategically responds to them to maintain critical functions of the cyber systems in the event of successful attacks. Feedback architectures play a pivotal role in enabling the online sensing, reasoning, and actuation process of the CRM. Reinforcement Learning (RL) is an essential tool that epitomizes the feedback architectures for cyber resilience. It allows the CRM to provide sequential responses to attacks with limited or without prior knowledge of the environment and the attacker. In this work, we review the literature on RL for cyber resilience and discuss cyber resilience against three major types of vulnerabilities, i.e., posture-related, information-related, and human-related vulnerabilities. We introduce three application domains of CRMs: moving target defense, defensive cyber deception, and assistive human security technologies. The RL algorithms also have vulnerabilities themselves. We explain the three vulnerabilities of RL and present attack models where the attacker targets the information exchanged between the environment and the agent: the rewards, the state observations, and the action commands. We show that the attacker can trick the RL agent into learning a nefarious policy with minimum attacking effort. Lastly, we discuss the future challenges of RL for cyber security and resilience and emerging applications of RL-based CRMs.
SYFeb 17, 2021
Self-Triggered Markov Decision ProcessesYunhan Huang, Quanyan Zhu
In this paper, we study Markov Decision Processes (MDPs) with self-triggered strategies, where the idea of self-triggered control is extended to more generic MDP models. This extension broadens the application of self-triggering policies to a broader range of systems. We study the co-design problems of the control policy and the triggering policy to optimize two pre-specified cost criteria. The first cost criterion is introduced by incorporating a pre-specified update penalty into the traditional MDP cost criteria to reduce the use of communication resources. Under this criteria, a novel dynamic programming (DP) equation called DP equation with optimized lookahead to proposed to solve for the self-triggering policy under this criteria. The second self-triggering policy is to maximize the triggering time while still guaranteeing a pre-specified level of sub-optimality. Theoretical underpinnings are established for the computation and implementation of both policies. Through a gridworld numerical example, we illustrate the two policies' effectiveness in reducing sources consumption and demonstrate the trade-offs between resource consumption and system performance.
SYDec 4, 2020
Cross-Layer Coordinated Attacks on Cyber-Physical Systems: A LQG Game Framework with Controlled ObservationsYunhan Huang, Zehui Xiong, Quanyan Zhu
This work establishes a game-theoretic framework to study cross-layer coordinated attacks on cyber-physical systems (CPSs). The attacker can interfere with the physical process and launch jamming attacks on the communication channels simultaneously. At the same time, the defender can dodge the jamming by dispensing with observations. The generic framework captures a wide variety of classic attack models on CPSs. Leveraging dynamic programming techniques, we fully characterize the Subgame Perfect Equilibrium (SPE) control strategies. We also derive the SPE observation and jamming strategies and provide efficient computational methods to compute them. The results demonstrate that the physical and cyber attacks are coordinated and depend on each other. On the one hand, the control strategies are linear in the state estimate, and the estimate error caused by jamming attacks will induce performance degradation. On the other hand, the interactions between the attacker and the defender in the physical layer significantly impact the observation and jamming strategies. Numerical examples illustrate the interactions between the defender and the attacker through their observation and jamming strategies.
LGFeb 7, 2020
Manipulating Reinforcement Learning: Poisoning Attacks on Cost SignalsYunhan Huang, Quanyan Zhu
This chapter studies emerging cyber-attacks on reinforcement learning (RL) and introduces a quantitative approach to analyze the vulnerabilities of RL. Focusing on adversarial manipulation on the cost signals, we analyze the performance degradation of TD($λ$) and $Q$-learning algorithms under the manipulation. For TD($λ$), the approximation learned from the manipulated costs has an approximation error bound proportional to the magnitude of the attack. The effect of the adversarial attacks on the bound does not depend on the choice of $λ$. In $Q$-learning, we show that $Q$-learning algorithms converge under stealthy attacks and bounded falsifications on cost signals. We characterize the relation between the falsified cost and the $Q$-factors as well as the policy learned by the learning agent which provides fundamental limits for feasible offensive and defensive moves. We propose a robust region in terms of the cost within which the adversary can never achieve the targeted policy. We provide conditions on the falsified cost which can mislead the agent to learn an adversary's favored policy. A case study of TD($λ$) learning is provided to corroborate the results.
SYOct 16, 2019
Dynamic Games for Secure and Resilient Control System DesignYunhan Huang, Juntao Chen, Linan Huang et al.
Modern control systems are featured by their hierarchical structure composing of cyber, physical, and human layers. The intricate dependencies among multiple layers and units of modern control systems require an integrated framework to address cross-layer design issues related to security and resilience challenges. To this end, game theory provides a bottom-up modeling paradigm to capture the strategic interactions among multiple components of the complex system and enables a holistic view to understand and design cyber-physical-human control systems. In this review, we first provide a multi-layer perspective toward increasingly complex and integrated control systems and then introduce several variants of dynamic games for modeling different layers of control systems. We present game-theoretic methods for understanding the fundamental tradeoffs of robustness, security, and resilience and developing a clean-slate cross-layer approach to enhance the system performance in various adversarial environments. This review also includes three quintessential research problems that represent three research directions where dynamic game approaches can bridge between multiple research areas and make significant contributions to the design of modern control systems. The paper is concluded with a discussion on emerging areas of research that crosscut dynamic games and control systems.
LGJun 24, 2019
Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost SignalsYunhan Huang, Quanyan Zhu
This paper studies reinforcement learning (RL) under malicious falsification on cost signals and introduces a quantitative framework of attack models to understand the vulnerabilities of RL. Focusing on $Q$-learning, we show that $Q$-learning algorithms converge under stealthy attacks and bounded falsifications on cost signals. We characterize the relation between the falsified cost and the $Q$-factors as well as the policy learned by the learning agent which provides fundamental limits for feasible offensive and defensive moves. We propose a robust region in terms of the cost within which the adversary can never achieve the targeted policy. We provide conditions on the falsified cost which can mislead the agent to learn an adversary's favored policy. A numerical case study of water reservoir control is provided to show the potential hazards of RL in learning-based control systems and corroborate the results.