3D Convolution on RGB-D Point Clouds for Accurate Model-free Object Pose Estimation
This addresses the problem of accurate pose estimation without requiring 3D models for robotics applications, though it is incremental as it builds on existing CNN methods.
The paper tackles model-free object pose estimation by proposing a two-stage pipeline using 3D convolutions on RGB-D point clouds, achieving translation errors around 1 cm and rotation errors around 5 degrees with over 90% success in robotic grasping tests.
The conventional pose estimation of a 3D object usually requires the knowledge of the 3D model of the object. Even with the recent development in convolutional neural networks (CNNs), a 3D model is often necessary in the final estimation. In this paper, we propose a two-stage pipeline that takes in raw colored point cloud data and estimates an object's translation and rotation by running 3D convolutions on voxels. The pipeline is simple yet highly accurate: translation error is reduced to the voxel resolution (around 1 cm) and rotation error is around 5 degrees. The pipeline is also put to actual robotic grasping tests where it achieves above 90% success rate for test objects. Another innovation is that a motion capture system is used to automatically label the point cloud samples which makes it possible to rapidly collect a large amount of highly accurate real data for training the neural networks.