Multi-View Fusion for Multi-Level Robotic Scene Understanding
This work addresses scene understanding for robotic manipulation, but it is incremental as it fuses existing techniques without introducing major new methods.
The paper tackles the problem of multi-level scene awareness for robotic manipulation by developing a system that fuses point cloud representations, rough pose estimation for unknown objects, and full 6-DoF pose for known objects from RGB images, demonstrating their complementary benefits.
We present a system for multi-level scene awareness for robotic manipulation. Given a sequence of camera-in-hand RGB images, the system calculates three types of information: 1) a point cloud representation of all the surfaces in the scene, for the purpose of obstacle avoidance; 2) the rough pose of unknown objects from categories corresponding to primitive shapes (e.g., cuboids and cylinders); and 3) full 6-DoF pose of known objects. By developing and fusing recent techniques in these domains, we provide a rich scene representation for robot awareness. We demonstrate the importance of each of these modules, their complementary nature, and the potential benefits of the system in the context of robotic manipulation.