Yanbing Mao

LG
h-index3
8papers
21citations
Novelty53%
AI Score36

8 Papers

LGMar 29, 2023
Physical Deep Reinforcement Learning Towards Safety Guarantee

Hongpeng Cao, Yanbing Mao, Lui Sha et al.

Deep reinforcement learning (DRL) has achieved tremendous success in many complex decision-making tasks of autonomous systems with high-dimensional state and/or action spaces. However, the safety and stability still remain major concerns that hinder the applications of DRL to safety-critical autonomous systems. To address the concerns, we proposed the Phy-DRL: a physical deep reinforcement learning framework. The Phy-DRL is novel in two architectural designs: i) Lyapunov-like reward, and ii) residual control (i.e., integration of physics-model-based control and data-driven control). The concurrent physical reward and residual control empower the Phy-DRL the (mathematically) provable safety and stability guarantees. Through experiments on the inverted pendulum, we show that the Phy-DRL features guaranteed safety and stability and enhanced robustness, while offering remarkably accelerated training and enlarged reward.

LGSep 27, 2022
Phy-Taylor: Physics-Model-Based Deep Neural Networks

Yanbing Mao, Lui Sha, Huajie Shao et al.

Purely data-driven deep neural networks (DNNs) applied to physical engineering systems can infer relations that violate physics laws, thus leading to unexpected consequences. To address this challenge, we propose a physics-model-based DNN framework, called Phy-Taylor, that accelerates learning compliant representations with physical knowledge. The Phy-Taylor framework makes two key contributions; it introduces a new architectural Physics-compatible neural network (PhN), and features a novel compliance mechanism, we call {\em Physics-guided Neural Network Editing\}. The PhN aims to directly capture nonlinearities inspired by physical quantities, such as kinetic energy, potential energy, electrical power, and aerodynamic drag force. To do so, the PhN augments neural network layers with two key components: (i) monomials of Taylor series expansion of nonlinear functions capturing physical knowledge, and (ii) a suppressor for mitigating the influence of noise. The neural-network editing mechanism further modifies network links and activation functions consistently with physical knowledge. As an extension, we also propose a self-correcting Phy-Taylor framework that introduces two additional capabilities: (i) physics-model-based safety relationship learning, and (ii) automatic output correction when violations of safety occur. Through experiments, we show that (by expressing hard-to-learn nonlinearities directly and by constraining dependencies) Phy-Taylor features considerably fewer parameters, and a remarkably accelerated training process, while offering enhanced model robustness and accuracy.

SIMar 16, 2018
Spread of Information with Confirmation Bias in Cyber-Social Networks

Yanbing Mao, Sadegh Bolouki, Emrah Akyol

This paper provides a model to investigate information spreading over cyber-social network of agents communicating with each other. The cyber-social network considered here comprises individuals and news agencies. Each individual holds a belief represented by a scalar. Individuals receive information from news agencies that are closer to their belief, confirmation bias is explicitly incorporated into the model. The proposed dynamics of cyber-social networks is adopted from DeGroot-Friedkin model, where the individual's opinion update mechanism is a convex combination of his innate opinion, his neighbors' opinions at the previous time step (obtained from the social network), and the opinions passed along by news agencies from cyber layer which he follows. The characteristics of the interdependent social and cyber networks are radically different here: the social network relies on trust and hence static while the news agencies are highly dynamic since they are weighted as a function of the distance between an individual state and the state of news agency to account for confirmation bias. The conditions for convergence of the aforementioned dynamics to a unique equilibrium are characterized. The estimation and exact computation of the steady-state values under non-linear and linear state-dependent weight functions are provided. Finally, the impact of polarization in the opinions of news agencies on the public opinion evolution is numerically analyzed in the context of the well-known Krackhardt's advice network.

SYMar 4, 2019
Strategic Topology Switching for Security-Part I: Consensus & Switching Times

Yanbing Mao, Emrah Akyol, Ziang Zhang

In this two-part paper, we consider strategic topology switching for the second-order multi-agent systems under a special class of stealthy attacks, namely the "zero-dynamics" attack (ZDA). The main mathematical tool proposed here is to strategically switch the network topology to detect a possible ZDA. However, it is not clear a priori that such a switching strategy still yields consensus in this switched system, in the normal (un-attacked) operation mode. In Part I, we propose a strategy on the switching times that enables the topology-switching algorithm proposed in Part II to reach the second-order consensus in the absence of a ZDA. Utilizing the theory of stable switched linear systems with unstable subsystems, we characterize sufficient conditions for the dwell time of topology-switching signal to reach consensus. Building on this characterization, we then propose a decentralized time-dependent topology-switching algorithm. The proposed algorithm, used in conjunction with a simplified control protocol, achieves consensus while providing substantial advantages over other control approaches: it relies only on the relative position measurements (without any requirement for velocity measurements); and it does not impose any constraint on the magnitudes of coupling weights. We finally demonstrate our theoretical findings via the numerical simulation results.

LGSep 5, 2024
Simplex-enabled Safe Continual Learning Machine

Hongpeng Cao, Yanbing Mao, Yihao Cai et al.

This paper proposes the SeC-Learning Machine: Simplex-enabled safe continual learning for safety-critical autonomous systems. The SeC-learning machine is built on Simplex logic (that is, ``using simplicity to control complexity'') and physics-regulated deep reinforcement learning (Phy-DRL). The SeC-learning machine thus constitutes HP (high performance)-Student, HA (high assurance)-Teacher, and Coordinator. Specifically, the HP-Student is a pre-trained high-performance but not fully verified Phy-DRL, continuing to learn in a real plant to tune the action policy to be safe. In contrast, the HA-Teacher is a mission-reduced, physics-model-based, and verified design. As a complementary, HA-Teacher has two missions: backing up safety and correcting unsafe learning. The Coordinator triggers the interaction and the switch between HP-Student and HA-Teacher. Powered by the three interactive components, the SeC-learning machine can i) assure lifetime safety (i.e., safety guarantee in any continual-learning stage, regardless of HP-Student's success or convergence), ii) address the Sim2Real gap, and iii) learn to tolerate unknown unknowns in real plants. The experiments on a cart-pole system and a real quadruped robot demonstrate the distinguished features of the SeC-learning machine, compared with continual learning built on state-of-the-art safe DRL frameworks with approaches to addressing the Sim2Real gap.

ROOct 30, 2025
Real-DRL: Teach and Learn in Reality

Yanbing Mao, Yihao Cai, Lui Sha

This paper introduces the Real-DRL framework for safety-critical autonomous systems, enabling runtime learning of a deep reinforcement learning (DRL) agent to develop safe and high-performance action policies in real plants (i.e., real physical systems to be controlled), while prioritizing safety! The Real-DRL consists of three interactive components: a DRL-Student, a PHY-Teacher, and a Trigger. The DRL-Student is a DRL agent that innovates in the dual self-learning and teaching-to-learn paradigm and the real-time safety-informed batch sampling. On the other hand, PHY-Teacher is a physics-model-based design of action policies that focuses solely on safety-critical functions. PHY-Teacher is novel in its real-time patch for two key missions: i) fostering the teaching-to-learn paradigm for DRL-Student and ii) backing up the safety of real plants. The Trigger manages the interaction between the DRL-Student and the PHY-Teacher. Powered by the three interactive components, the Real-DRL can effectively address safety challenges that arise from the unknown unknowns and the Sim2Real gap. Additionally, Real-DRL notably features i) assured safety, ii) automatic hierarchy learning (i.e., safety-first learning and then high-performance learning), and iii) safety-informed batch sampling to address the learning experience imbalance caused by corner cases. Experiments with a real quadruped robot, a quadruped robot in NVIDIA Isaac Gym, and a cart-pole system, along with comparisons and ablation studies, demonstrate the Real-DRL's effectiveness and unique features.

RODec 17, 2024
Physics-model-guided Worst-case Sampling for Safe Reinforcement Learning

Hongpeng Cao, Yanbing Mao, Lui Sha et al.

Real-world accidents in learning-enabled CPS frequently occur in challenging corner cases. During the training of deep reinforcement learning (DRL) policy, the standard setup for training conditions is either fixed at a single initial condition or uniformly sampled from the admissible state space. This setup often overlooks the challenging but safety-critical corner cases. To bridge this gap, this paper proposes a physics-model-guided worst-case sampling strategy for training safe policies that can handle safety-critical cases toward guaranteed safety. Furthermore, we integrate the proposed worst-case sampling strategy into the physics-regulated deep reinforcement learning (Phy-DRL) framework to build a more data-efficient and safe learning algorithm for safety-critical CPS. We validate the proposed training strategy with Phy-DRL through extensive experiments on a simulated cart-pole system, a 2D quadrotor, a simulated and a real quadruped robot, showing remarkably improved sampling efficiency to learn more robust safe policies.

AIMay 26, 2023
Physics-Regulated Deep Reinforcement Learning: Invariant Embeddings

Hongpeng Cao, Yanbing Mao, Lui Sha et al.

This paper proposes the Phy-DRL: a physics-regulated deep reinforcement learning (DRL) framework for safety-critical autonomous systems. The Phy-DRL has three distinguished invariant-embedding designs: i) residual action policy (i.e., integrating data-driven-DRL action policy and physics-model-based action policy), ii) automatically constructed safety-embedded reward, and iii) physics-model-guided neural network (NN) editing, including link editing and activation editing. Theoretically, the Phy-DRL exhibits 1) a mathematically provable safety guarantee and 2) strict compliance of critic and actor networks with physics knowledge about the action-value function and action policy. Finally, we evaluate the Phy-DRL on a cart-pole system and a quadruped robot. The experiments validate our theoretical results and demonstrate that Phy-DRL features guaranteed safety compared to purely data-driven DRL and solely model-based design while offering remarkably fewer learning parameters and fast training towards safety guarantee.