ROSep 3, 2022
Reinforcement Learning with Prior Policy Guidance for Motion Planning of Dual-Arm Free-Floating Space RobotYuxue Cao, Shengjie Wang, Xiang Zheng et al.
Reinforcement learning methods as a promising technique have achieved superior results in the motion planning of free-floating space robots. However, due to the increase in planning dimension and the intensification of system dynamics coupling, the motion planning of dual-arm free-floating space robots remains an open challenge. In particular, the current study cannot handle the task of capturing a non-cooperative object due to the lack of the pose constraint of the end-effectors. To address the problem, we propose a novel algorithm, EfficientLPT, to facilitate RL-based methods to improve planning accuracy efficiently. Our core contributions are constructing a mixed policy with prior knowledge guidance and introducing infinite norm to build a more reasonable reward function. Furthermore, our method successfully captures a rotating object with different spinning speeds.
ROJul 6, 2022
A Learning System for Motion Planning of Free-Float Dual-Arm Space Manipulator towards Non-Cooperative ObjectShengjie Wang, Yuxue Cao, Xiang Zheng et al.
Recent years have seen the emergence of non-cooperative objects in space, like failed satellites and space junk. These objects are usually operated or collected by free-float dual-arm space manipulators. Thanks to eliminating the difficulties of modeling and manual parameter-tuning, reinforcement learning (RL) methods have shown a more promising sign in the trajectory planning of space manipulators. Although previous studies demonstrate their effectiveness, they cannot be applied in tracking dynamic targets with unknown rotation (non-cooperative objects). In this paper, we proposed a learning system for motion planning of free-float dual-arm space manipulator (FFDASM) towards non-cooperative objects. Specifically, our method consists of two modules. Module I realizes the multi-target trajectory planning for two end-effectors within a large target space. Next, Module II takes as input the point clouds of the non-cooperative object to estimate the motional property, and then can predict the position of target points on an non-cooperative object. We leveraged the combination of Module I and Module II to track target points on a spinning object with unknown regularity successfully. Furthermore, the experiments also demonstrate the scalability and generalization of our learning system.
ROJan 2, 2023
A Policy Optimization Method Towards Optimal-time StabilityShengjie Wang, Fengbo Lan, Xiang Zheng et al.
In current model-free reinforcement learning (RL) algorithms, stability criteria based on sampling methods are commonly utilized to guide policy optimization. However, these criteria only guarantee the infinite-time convergence of the system's state to an equilibrium point, which leads to sub-optimality of the policy. In this paper, we propose a policy optimization technique incorporating sampling-based Lyapunov stability. Our approach enables the system's state to reach an equilibrium point within an optimal time and maintain stability thereafter, referred to as "optimal-time stability". To achieve this, we integrate the optimization method into the Actor-Critic framework, resulting in the development of the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm. Through evaluations conducted on ten robotic tasks, our approach outperforms previous studies significantly, effectively guiding the system to generate stable patterns.
ROMar 27, 2023
A Learning-based Adaptive Compliance Method for Symmetric Bi-manual ManipulationYuxue Cao, Wenbo Zhao, Shengjie Wang et al.
Symmetric bi-manual manipulation is an essential skill in on-orbit operations due to its potent load capacity. Previous works have applied compliant control to maintain the stability of manipulations. However, traditional methods have viewed motion planning and compliant control as two separate modules, which can lead to conflicts with the simultaneous change of the desired trajectory and impedance parameters in the presence of external forces and disturbances. Additionally, the joint usage of these two modules requires experts to manually adjust parameters. To achieve high efficiency while enhancing adaptability, we propose a novel Learning-based Adaptive Compliance algorithm (LAC) that improves the efficiency and robustness of symmetric bi-manual manipulation. Specifically, the algorithm framework integrates desired trajectory generation and impedance-parameter adjustment under a unified framework to mitigate contradictions and improve efficiency. Second, we introduce a centralized Actor-Critic framework with LSTM networks preprocessing the force states, enhancing the synchronization of bi-manual manipulation. When evaluated in dual-arm peg-in-hole assembly experiments, our method outperforms baseline algorithms in terms of optimality and robustness.