MVTrans: Multi-View Perception of Transparent Objects
This addresses the open problem of transparent object perception in robotics, which is crucial for applications like household and laboratory manipulation, though it is incremental as it extends existing stereo-based methods.
The paper tackles transparent object perception for robot manipulation by proposing MVTrans, an end-to-end multi-view architecture that achieves depth estimation, segmentation, and pose estimation, and introduces a large-scale dataset, Syn-TODD, for training with multiple modalities.
Transparent object perception is a crucial skill for applications such as robot manipulation in household and laboratory settings. Existing methods utilize RGB-D or stereo inputs to handle a subset of perception tasks including depth and pose estimation. However, transparent object perception remains to be an open problem. In this paper, we forgo the unreliable depth map from RGB-D sensors and extend the stereo based method. Our proposed method, MVTrans, is an end-to-end multi-view architecture with multiple perception capabilities, including depth estimation, segmentation, and pose estimation. Additionally, we establish a novel procedural photo-realistic dataset generation pipeline and create a large-scale transparent object detection dataset, Syn-TODD, which is suitable for training networks with all three modalities, RGB-D, stereo and multi-view RGB. Project Site: https://ac-rad.github.io/MVTrans/