ROJul 29, 2023
Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion PlanningZengjie Zhang, Jayden Hong, Amir Soufi Enayati et al.
Reinforcement learning (RL) for motion planning of multi-degree-of-freedom robots still suffers from low efficiency in terms of slow training speed and poor generalizability. In this paper, we propose a novel RL-based robot motion planning framework that uses implicit behavior cloning (IBC) and dynamic movement primitive (DMP) to improve the training speed and generalizability of an off-policy RL agent. IBC utilizes human demonstration data to leverage the training speed of RL, and DMP serves as a heuristic model that transfers motion planning into a simpler planning space. To support this, we also create a human demonstration dataset using a pick-and-place experiment that can be used for similar studies. Comparison studies in simulation reveal the advantage of the proposed method over the conventional RL agents with faster training speed and higher scores. A real-robot experiment indicates the applicability of the proposed method to a simple assembly task. Our work provides a novel perspective on using motion primitives and human demonstration to leverage the performance of RL for robot applications.
ROJun 5, 2023
Risk-Aware Reward Shaping of Reinforcement Learning Agents for Autonomous DrivingLin-Chi Wu, Zengjie Zhang, Sofie Haesaert et al.
Reinforcement learning (RL) is an effective approach to motion planning in autonomous driving, where an optimal driving policy can be automatically learned using the interaction data with the environment. Nevertheless, the reward function for an RL agent, which is significant to its performance, is challenging to be determined. The conventional work mainly focuses on rewarding safe driving states but does not incorporate the awareness of risky driving behaviors of the vehicles. In this paper, we investigate how to use risk-aware reward shaping to leverage the training and test performance of RL agents in autonomous driving. Based on the essential requirements that prescribe the safety specifications for general autonomous driving in practice, we propose additional reshaped reward terms that encourage exploration and penalize risky driving behaviors. A simulation study in OpenAI Gym indicates the advantage of risk-aware reward shaping for various RL agents. Also, we point out that proximal policy optimization (PPO) is likely to be the best RL method that works with risk-aware reward shaping.
ROApr 12, 2023
Facilitating Sim-to-real by Intrinsic Stochasticity of Real-Time Simulation in Reinforcement Learning for Robot ManipulationRam Dershan, Amir M. Soufi Enayati, Zengjie Zhang et al.
Simulation is essential to reinforcement learning (RL) before implementation in the real world, especially for safety-critical applications like robot manipulation. Conventionally, RL agents are sensitive to the discrepancies between the simulation and the real world, known as the sim-to-real gap. The application of domain randomization, a technique used to fill this gap, is limited to the imposition of heuristic-randomized models. {We investigate the properties of intrinsic stochasticity of real-time simulation (RT-IS) of off-the-shelf simulation software and its potential to improve RL performance. This improvement includes a higher tolerance to noise and model imprecision and superiority to conventional domain randomization in terms of ease of use and automation. Firstly, we conduct analytical studies to measure the correlation of RT-IS with the utilization of computer hardware and validate its comparability with the natural stochasticity of a physical robot. Then, we exploit the RT-IS feature in the training of an RL agent. The simulation and physical experiment results verify the feasibility and applicability of RT-IS to robust agent training for robot manipulation tasks. The RT-IS-powered RL agent outperforms conventional agents on robots with modeling uncertainties. RT-IS requires less heuristic randomization, is not task-dependent, and achieves better generalizability than the conventional domain-randomization-powered agents. Our findings provide a new perspective on the sim-to-real problem in practical applications like robot manipulation tasks.
ROApr 12, 2023
Human-Robot Skill Transfer with Enhanced Compliance via Dynamic Movement PrimitivesJayden Hong, Zengjie Zhang, Amir M. Soufi Enayati et al.
Finding an efficient way to adapt robot trajectory is a priority to improve overall performance of robots. One approach for trajectory planning is through transferring human-like skills to robots by Learning from Demonstrations (LfD). The human demonstration is considered the target motion to mimic. However, human motion is typically optimal for human embodiment but not for robots because of the differences between human biomechanics and robot dynamics. The Dynamic Movement Primitives (DMP) framework is a viable solution for this limitation of LfD, but it requires tuning the second-order dynamics in the formulation. Our contribution is introducing a systematic method to extract the dynamic features from human demonstration to auto-tune the parameters in the DMP framework. In addition to its use with LfD, another utility of the proposed method is that it can readily be used in conjunction with Reinforcement Learning (RL) for robot training. In this way, the extracted features facilitate the transfer of human skills by allowing the robot to explore the possible trajectories more efficiently and increasing robot compliance significantly. We introduced a methodology to extract the dynamic features from multiple trajectories based on the optimization of human-likeness and similarity in the parametric space. Our method was implemented into an actual human-robot setup to extract human dynamic features and used to regenerate the robot trajectories following both LfD and RL with DMP. It resulted in a stable performance of the robot, maintaining a high degree of human-likeness based on accumulated distance error as good as the best heuristic tuning.
ROApr 12, 2023
Sample-Efficient Reinforcement Learning with Symmetry-Guided Demonstrations for Robotic ManipulationAmir M. Soufi Enayati, Zengjie Zhang, Kashish Gupta et al.
Reinforcement learning (RL) suffers from low sample efficiency, particularly in high-dimensional continuous state-action spaces of complex robotic manipulation tasks. RL performance can improve by leveraging prior knowledge, even when demonstrations are limited and collected from simplified environments. To address this, we define General Abstract Symmetry (GAS) for aggregating demonstrations from symmetrical abstract partitions of the robot environment. We introduce Demo-EASE, a novel training framework using a dual-buffer architecture that stores both demonstrations and RL-generated experiences. Demo-EASE improves sample efficiency through symmetry-guided demonstrations and behavior cloning, enabling strong initialization and balanced exploration-exploitation. Demo-EASE is compatible with both on-policy and off-policy RL algorithms, supporting various training regimes. We evaluate our framework in three simulation experiments using a Kinova Gen3 robot with joint-space control within PyBullet. Our results show that Demo-EASE significantly accelerates convergence and improves final performance compared to standard RL baselines, demonstrating its potential for efficient real-world robotic manipulation learning.
ROSep 14, 2024
VernaCopter: Disambiguated Natural-Language-Driven Robot via Formal SpecificationsTeun van de Laar, Zengjie Zhang, Shuhao Qi et al.
It has been an ambition of many to control a robot for a complex task using natural language (NL). The rise of large language models (LLMs) makes it closer to coming true. However, an LLM-powered system still suffers from the ambiguity inherent in an NL and the uncertainty brought up by LLMs. This paper proposes a novel LLM-based robot motion planner, named \textit{VernaCopter}, with signal temporal logic (STL) specifications serving as a bridge between NL commands and specific task objectives. The rigorous and abstract nature of formal specifications allows the planner to generate high-quality and highly consistent paths to guide the motion control of a robot. Compared to a conventional NL-prompting-based planner, the proposed VernaCopter planner is more stable and reliable due to less ambiguous uncertainty. Its efficacy and advantage have been validated by two small but challenging experimental scenarios, implying its potential in designing NL-driven robots.
ROAug 23, 2023
Identifying Reaction-Aware Driving Styles of Stochastic Model Predictive Controlled Vehicles by Inverse Reinforcement LearningNi Dang, Tao Shi, Zengjie Zhang et al.
The driving style of an Autonomous Vehicle (AV) refers to how it behaves and interacts with other AVs. In a multi-vehicle autonomous driving system, an AV capable of identifying the driving styles of its nearby AVs can reliably evaluate the risk of collisions and make more reasonable driving decisions. However, there has not been a consistent definition of driving styles for an AV in the literature, although it is considered that the driving style is encoded in the AV's trajectories and can be identified using Maximum Entropy Inverse Reinforcement Learning (ME-IRL) methods as a cost function. Nevertheless, an important indicator of the driving style, i.e., how an AV reacts to its nearby AVs, is not fully incorporated in the feature design of previous ME-IRL methods. In this paper, we describe the driving style as a cost function of a series of weighted features. We design additional novel features to capture the AV's reaction-aware characteristics. Then, we identify the driving styles from the demonstration trajectories generated by the Stochastic Model Predictive Control (SMPC) using a modified ME-IRL method with our newly proposed features. The proposed method is validated using MATLAB simulation and an off-the-shelf experiment.