Weiye Zhao

RO
h-index17
14papers
271citations
Novelty49%
AI Score30

14 Papers

ROAug 19, 2024Code
Physics-Aware Combinatorial Assembly Sequence Planning using Data-free Action Masking

Ruixuan Liu, Alan Chen, Weiye Zhao et al.

Combinatorial assembly uses standardized unit primitives to build objects that satisfy user specifications. This paper studies assembly sequence planning (ASP) for physical combinatorial assembly. Given the shape of the desired object, the goal is to find a sequence of actions for placing unit primitives to build the target object. In particular, we aim to ensure the planned assembly sequence is physically executable. However, ASP for combinatorial assembly is particularly challenging due to its combinatorial nature. To address the challenge, we employ deep reinforcement learning to learn a construction policy for placing unit primitives sequentially to build the desired object. Specifically, we design an online physics-aware action mask that filters out invalid actions, which effectively guides policy learning and ensures violation-free deployment. In the end, we apply the proposed method to Lego assembly with more than 250 3D structures. The experiment results demonstrate that the proposed method plans physically valid assembly sequences to build all structures, achieving a $100\%$ success rate, whereas the best comparable baseline fails more than $40$ structures. Our implementation is available at \url{https://github.com/intelligent-control-lab/PhysicsAwareCombinatorialASP}.

LGFeb 6, 2023
State-wise Safe Reinforcement Learning: A Survey

Weiye Zhao, Tairan He, Rui Chen et al.

Despite the tremendous success of Reinforcement Learning (RL) algorithms in simulation environments, applying RL to real-world applications still faces many challenges. A major concern is safety, in another word, constraint satisfaction. State-wise constraints are one of the most common constraints in real-world applications and one of the most challenging constraints in Safe RL. Enforcing state-wise constraints is necessary and essential to many challenging tasks such as autonomous driving, robot manipulation. This paper provides a comprehensive review of existing approaches that address state-wise constraints in RL. Under the framework of State-wise Constrained Markov Decision Process (SCMDP), we will discuss the connections, differences, and trade-offs of existing approaches in terms of (i) safety guarantee and scalability, (ii) safety and reward performance, and (iii) safety after convergence and during training. We also summarize limitations of current methods and discuss potential future directions.

LGJan 24, 2023
AutoCost: Evolving Intrinsic Cost for Zero-violation Reinforcement Learning

Tairan He, Weiye Zhao, Changliu Liu

Safety is a critical hurdle that limits the application of deep reinforcement learning (RL) to real-world control tasks. To this end, constrained reinforcement learning leverages cost functions to improve safety in constrained Markov decision processes. However, such constrained RL methods fail to achieve zero violation even when the cost limit is zero. This paper analyzes the reason for such failure, which suggests that a proper cost function plays an important role in constrained RL. Inspired by the analysis, we propose AutoCost, a simple yet effective framework that automatically searches for cost functions that help constrained RL to achieve zero-violation performance. We validate the proposed method and the searched cost function on the safe RL benchmark Safety Gym. We compare the performance of augmented agents that use our cost function to provide additive intrinsic costs with baseline agents that use the same policy learners but with only extrinsic costs. Results show that the converged policies with intrinsic costs in all environments achieve zero constraint violation and comparable performance with baselines.

LGJun 21, 2023
State-wise Constrained Policy Optimization

Weiye Zhao, Rui Chen, Yifan Sun et al.

Reinforcement Learning (RL) algorithms have shown tremendous success in simulation environments, but their application to real-world problems faces significant challenges, with safety being a major concern. In particular, enforcing state-wise constraints is essential for many challenging tasks such as autonomous driving and robot manipulation. However, existing safe RL algorithms under the framework of Constrained Markov Decision Process (CMDP) do not consider state-wise constraints. To address this gap, we propose State-wise Constrained Policy Optimization (SCPO), the first general-purpose policy search algorithm for state-wise constrained reinforcement learning. SCPO provides guarantees for state-wise constraint satisfaction in expectation. In particular, we introduce the framework of Maximum Markov Decision Process, and prove that the worst-case safety violation is bounded under SCPO. We demonstrate the effectiveness of our approach on training neural network policies for extensive robot locomotion tasks, where the agent must satisfy a variety of state-wise safety constraints. Our results show that SCPO significantly outperforms existing methods and can handle state-wise constraints in high-dimensional robotics tasks.

SYNov 12, 2023
Learning Predictive Safety Filter via Decomposition of Robust Invariant Set

Zeyang Li, Chuxiong Hu, Weiye Zhao et al.

Ensuring safety of nonlinear systems under model uncertainty and external disturbances is crucial, especially for real-world control tasks. Predictive methods such as robust model predictive control (RMPC) require solving nonconvex optimization problems online, which leads to high computational burden and poor scalability. Reinforcement learning (RL) works well with complex systems, but pays the price of losing rigorous safety guarantee. This paper presents a theoretical framework that bridges the advantages of both RMPC and RL to synthesize safety filters for nonlinear systems with state- and action-dependent uncertainty. We decompose the robust invariant set (RIS) into two parts: a target set that aligns with terminal region design of RMPC, and a reach-avoid set that accounts for the rest of RIS. We propose a policy iteration approach for robust reach-avoid problems and establish its monotone convergence. This method sets the stage for an adversarial actor-critic deep RL algorithm, which simultaneously synthesizes a reach-avoid policy network, a disturbance policy network, and a reach-avoid value network. The learned reach-avoid policy network is utilized to generate nominal trajectories for online verification, which filters potentially unsafe actions that may drive the system into unsafe regions when worst-case disturbances are applied. We formulate a second-order cone programming (SOCP) approach for online verification using system level synthesis, which optimizes for the worst-case reach-avoid value of any possible trajectories. The proposed safety filter requires much lower computational complexity than RMPC and still enjoys persistent robust safety guarantee. The effectiveness of our method is illustrated through a numerical example.

LGOct 20, 2023
Absolute Policy Optimization

Weiye Zhao, Feihan Li, Yifan Sun et al.

In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios. However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability to control over the worst-case performance outcomes. To address this limitation, we introduce a novel objective function, optimizing which leads to guaranteed monotonic improvement in the lower probability bound of performance with high confidence. Building upon this groundbreaking theoretical advancement, we further introduce a practical solution called Absolute Policy Optimization (APO). Our experiments demonstrate the effectiveness of our approach across challenging continuous control benchmark tasks and extend its applicability to mastering Atari games. Our findings reveal that APO as well as its efficient variation Proximal Absolute Policy Optimization (PAPO) significantly outperforms state-of-the-art policy gradient algorithms, resulting in substantial improvements in worst-case performance, as well as expected performance.

ROApr 18, 2021Code
Provably Safe Tolerance Estimation for Robot Arms via Sum-of-Squares Programming

Weiye Zhao, Suqin He, Changliu Liu

Tolerance estimation problems are prevailing in engineering applications. For example, in modern robotics, it remains challenging to efficiently estimate joint tolerance, \ie the maximal allowable deviation from a reference robot state such that safety constraints are still satisfied. This paper presented an efficient algorithm to estimate the joint tolerance using sum-of-squares programming. It is theoretically proved that the algorithm provides a tight lower bound of the joint tolerance. Extensive numerical studies demonstrate that the proposed method is computationally efficient and near optimal. The algorithm is implemented in the JTE toolbox and is available at \url{https://github.com/intelligent-control-lab/Sum-of-Square-Safety-Optimization}.

ROMay 18, 2024
Meta-Control: Automatic Model-based Control Synthesis for Heterogeneous Robot Skills

Tianhao Wei, Liqian Ma, Rui Chen et al.

The requirements for real-world manipulation tasks are diverse and often conflicting; some tasks require precise motion while others require force compliance; some tasks require avoidance of certain regions, while others require convergence to certain states. Satisfying these varied requirements with a fixed state-action representation and control strategy is challenging, impeding the development of a universal robotic foundation model. In this work, we propose Meta-Control, the first LLM-enabled automatic control synthesis approach that creates customized state representations and control strategies tailored to specific tasks. Our core insight is that a meta-control system can be built to automate the thought process that human experts use to design control systems. Specifically, human experts heavily use a model-based, hierarchical (from abstract to concrete) thought model, then compose various dynamic models and controllers together to form a control system. Meta-Control mimics the thought model and harnesses LLM's extensive control knowledge with Socrates' "art of midwifery" to automate the thought process. Meta-Control stands out for its fully model-based nature, allowing rigorous analysis, generalizability, robustness, efficient parameter tuning, and reliable real-time execution.

ROMay 4, 2024
Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning

Weiye Zhao, Feihan Li, Changliu Liu

Deep reinforcement learning (DRL) has demonstrated remarkable performance in many continuous control tasks. However, a significant obstacle to the real-world application of DRL is the lack of safety guarantees. Although DRL agents can satisfy system safety in expectation through reward shaping, designing agents to consistently meet hard constraints (e.g., safety specifications) at every time step remains a formidable challenge. In contrast, existing work in the field of safe control provides guarantees on persistent satisfaction of hard safety constraints. However, these methods require explicit analytical system dynamics models to synthesize safe control, which are typically inaccessible in DRL settings. In this paper, we present a model-free safe control algorithm, the implicit safe set algorithm, for synthesizing safeguards for DRL agents that ensure provable safety throughout training. The proposed algorithm synthesizes a safety index (barrier certificate) and a subsequent safe control law solely by querying a black-box dynamic function (e.g., a digital twin simulator). Moreover, we theoretically prove that the implicit safe set algorithm guarantees finite time convergence to the safe set and forward invariance for both continuous-time and discrete-time systems. We validate the proposed algorithm on the state-of-the-art Safety Gym benchmark, where it achieves zero safety violations while gaining $95\% \pm 9\%$ cumulative reward compared to state-of-the-art safe DRL methods. Furthermore, the resulting algorithm scales well to high-dimensional systems with parallel computing.

LGMay 23, 2023
GUARD: A Safe Reinforcement Learning Benchmark

Weiye Zhao, Yifan Sun, Feihan Li et al.

Due to the trial-and-error nature, it is typically challenging to apply RL algorithms to safety-critical real-world applications, such as autonomous driving, human-robot interaction, robot manipulation, etc, where such errors are not tolerable. Recently, safe RL (i.e. constrained RL) has emerged rapidly in the literature, in which the agents explore the environment while satisfying constraints. Due to the diversity of algorithms and tasks, it remains difficult to compare existing safe RL algorithms. To fill that gap, we introduce GUARD, a Generalized Unified SAfe Reinforcement Learning Development Benchmark. GUARD has several advantages compared to existing benchmarks. First, GUARD is a generalized benchmark with a wide variety of RL agents, tasks, and safety constraint specifications. Second, GUARD comprehensively covers state-of-the-art safe RL algorithms with self-contained implementations. Third, GUARD is highly customizable in tasks and algorithms. We present a comparison of state-of-the-art safe RL algorithms in various task settings using GUARD and establish baselines that future work can build on.

ROAug 9, 2020
Contact-Rich Trajectory Generation in Confined Environments Using Iterative Convex Optimization

Weiye Zhao, Suqin He, Chengtao Wen et al.

Applying intelligent robot arms in dynamic uncertain environments (i.e., flexible production lines) remains challenging, which requires efficient algorithms for real time trajectory generation. The motion planning problem for robot trajectory generation is highly nonlinear and nonconvex, which usually comes with collision avoidance constraints, robot kinematics and dynamics constraints, and task constraints (e.g., following a Cartesian trajectory defined on a surface and maintain the contact). The nonlinear and nonconvex planning problem is computationally expensive to solve, which limits the application of robot arms in the real world. In this paper, for redundant robot arm planning problems with complex constraints, we present a motion planning method using iterative convex optimization that can efficiently handle the constraints and generate optimal trajectories in real time. The proposed planner guarantees the satisfaction of the contact-rich task constraints and avoids collision in confined environments. Extensive experiments on trajectory generation for weld grinding are performed to demonstrate the effectiveness of the proposed method and its applicability in advanced robotic manufacturing.

ROJan 27, 2020
Experimental Evaluation of Human Motion Prediction: Toward Safe and Efficient Human Robot Collaboration

Weiye Zhao, Liting Sun, Changliu Liu et al.

Human motion prediction is non-trivial in modern industrial settings. Accurate prediction of human motion can not only improve efficiency in human robot collaboration, but also enhance human safety in close proximity to robots. Among existing prediction models, the parameterization and identification methods of those models vary. It remains unclear what is the necessary parameterization of a prediction model, whether online adaptation of the model is necessary, and whether prediction can help improve safety and efficiency during human robot collaboration. These problems result from the difficulty to quantitatively evaluate various prediction models in a closed-loop fashion in real human-robot interaction settings. This paper develops a method to evaluate the closed-loop performance of different prediction models. In particular, we compare models with different parameterizations and models with or without online parameter adaptation. Extensive experiments were conducted on a human robot collaboration platform. The experimental results demonstrated that human motion prediction significantly enhanced the collaboration efficiency and human safety. Adaptable prediction models that were parameterized by neural networks achieved the best performance.

RODec 19, 2019
Safe Adaptation with Multiplicative Uncertainties Using Robust Safe Set Algorithm

Charles Noren, Weiye Zhao, Changliu Liu

Maintaining safety under adaptation has long been considered to be an important capability for autonomous systems. As these systems estimate and change the ego-model of the system dynamics, questions regarding how to develop safety guarantees for such systems continue to be of interest. We propose a novel robust safe control methodology that uses set-based safety constraints to make a robotic system with dynamical uncertainties safely adapt and operate in its environment. The method consists of designing a scalar energy function (safety index) for an adaptive system with parametric uncertainty and an optimization-based approach for control synthesis. Simulation studies on a two-link manipulator are conducted and the results demonstrate the effectiveness of our proposed method in terms of generating provably safe control for adaptive systems with parametric uncertainty.

ROOct 1, 2018
Human Motion Prediction using Semi-adaptable Neural Networks

Yujiao Cheng, Weiye Zhao, Changliu Liu et al.

Human motion prediction is an important component to facilitate human robot interaction. Robots need to accurately predict human's future movement in order to safely plan its own motion trajectories and efficiently collaborate with humans. Many recent approaches predict human's movement using deep learning methods, such as recurrent neural networks. However, existing methods lack the ability to adapt to time-varying human behaviors, and many of them do not quantify uncertainties in the prediction. This paper proposes an approach that uses a semi-adaptable neural network for human motion prediction, and provides uncertainty bounds of the predictions in real time. In particular, a neural network is trained offline to represent the human motion transition model, and then recursive least square parameter adaptation algorithm (RLS-PAA) is adopted for online parameter adaptation of the neural network and for uncertainty estimation. Experiments on several human motion datasets verify that the proposed method significantly outperforms the state-of-the-art approach in terms of prediction accuracy and computation efficiency.