ROFeb 28, 2022

Hierarchical Policy Learning for Mechanical Search

Oussama Zenkri, Ngo Anh Vien, Gerhard Neumann

arXiv:2202.13680v14.04 citations

Originality Highly original

AI Analysis

This addresses the challenge of efficient object retrieval in robotics, offering a significant improvement over rule-based methods, though it is incremental as it builds on existing mechanical search and RL frameworks.

The paper tackles the problem of retrieving objects from clutters using mechanical search by formulating it as a hierarchical POMDP and proposing a hierarchical policy learning approach with deep reinforcement learning, increasing success rates from less than 32% to nearly 80% and reducing push action computation time from multiple seconds to less than 10 milliseconds.

Retrieving objects from clutters is a complex task, which requires multiple interactions with the environment until the target object can be extracted. These interactions involve executing action primitives like grasping or pushing as well as setting priorities for the objects to manipulate and the actions to execute. Mechanical Search (MS) is a framework for object retrieval, which uses a heuristic algorithm for pushing and rule-based algorithms for high-level planning. While rule-based policies profit from human intuition in how they work, they usually perform sub-optimally in many cases. Deep reinforcement learning (RL) has shown great performance in complex tasks such as taking decisions through evaluating pixels, which makes it suitable for training policies in the context of object-retrieval. In this work, we first formulate the MS problem in a principled formulation as a hierarchical POMDP. Based on this formulation, we propose a hierarchical policy learning approach for the MS problem. For demonstration, we present two main parameterized sub-policies: a push policy and an action selection policy. When integrated into the hierarchical POMDP's policy, our proposed sub-policies increase the success rate of retrieving the target object from less than 32% to nearly 80%, while reducing the computation time for push actions from multiple seconds to less than 10 milliseconds.

View on arXiv PDF

Similar