Learning 6-DoF Grasping and Pick-Place Using Attention Focus
This addresses manipulation challenges for robots in unstructured environments, though it is incremental as it builds on existing MDP frameworks with new constraints.
The paper tackles the problem of 6-DoF robotic grasping and pick-place in cluttered scenes by formulating it as a Markov decision process with hierarchical SE(3) sampling constraints, resulting in successful performance on three challenging tasks in simulation and on a real robot with simulation-only training.
We address a class of manipulation problems where the robot perceives the scene with a depth sensor and can move its end effector in a space with six degrees of freedom -- 3D position and orientation. Our approach is to formulate the problem as a Markov decision process (MDP) with abstract yet generally applicable state and action representations. Finding a good solution to the MDP requires adding constraints on the allowed actions. We develop a specific set of constraints called hierarchical $\text{SE}(3)$ sampling (HSE3S) which causes the robot to learn a sequence of gazes to focus attention on the task-relevant parts of the scene. We demonstrate the effectiveness of our approach on three challenging pick-place tasks (with novel objects in clutter and nontrivial places) both in simulation and on a real robot, even though all training is done in simulation.