Multi-Phase Multi-Objective Dexterous Manipulation with Adaptive Hierarchical Curriculum
This addresses the challenge of multi-phase, multi-objective manipulation for robotics, though it appears incremental as it builds on existing deep reinforcement learning methods with a novel reward mechanism.
The paper tackles the problem of robots struggling to learn optimal policies for dexterous manipulation tasks with multiple objectives whose priorities change over time, by developing an Adaptive Hierarchical Reward Mechanism (AHRM) that adapts reward hierarchies to these changing priorities, resulting in improved task performance and learning efficiency in simulations and physical experiments with a JACO robot arm.
Dexterous manipulation tasks usually have multiple objectives, and the priorities of these objectives may vary at different phases of a manipulation task. Varying priority makes a robot hardly or even failed to learn an optimal policy with a deep reinforcement learning (DRL) method. To solve this problem, we develop a novel Adaptive Hierarchical Reward Mechanism (AHRM) to guide the DRL agent to learn manipulation tasks with multiple prioritized objectives. The AHRM can determine the objective priorities during the learning process and update the reward hierarchy to adapt to the changing objective priorities at different phases. The proposed method is validated in a multi-objective manipulation task with a JACO robot arm in which the robot needs to manipulate a target with obstacles surrounded. The simulation and physical experiment results show that the proposed method improved robot learning in task performance and learning efficiency.