Bi-Directional Attention for Joint Instance and Semantic Segmentation in Point Clouds
This work addresses a fine-grained 3D scene understanding problem for computer vision applications, presenting an incremental improvement over existing multi-task learning methods.
The paper tackles the problem of joint instance and semantic segmentation in 3D point clouds by proposing a Bi-Directional Attention module to enhance feature aggregation and avoid task conflicts, achieving verified superiority on the S3DIS and PartNet datasets.
Instance segmentation in point clouds is one of the most fine-grained ways to understand the 3D scene. Due to its close relationship to semantic segmentation, many works approach these two tasks simultaneously and leverage the benefits of multi-task learning. However, most of them only considered simple strategies such as element-wise feature fusion, which may not lead to mutual promotion. In this work, we build a Bi-Directional Attention module on backbone neural networks for 3D point cloud perception, which uses similarity matrix measured from features for one task to help aggregate non-local information for the other task, avoiding the potential feature exclusion and task conflict. From comprehensive experiments and ablation studies on the S3DIS dataset and the PartNet dataset, the superiority of our method is verified. Moreover, the mechanism of how bi-directional attention module helps joint instance and semantic segmentation is also analyzed.