CVJul 7, 2024

Multi-branch Collaborative Learning Network for 3D Visual Grounding

arXiv:2407.05363v228 citationsh-index: 25
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing collaboration between overlapping 3D vision tasks for researchers in computer vision, though it is incremental as it builds on existing collaborative approaches.

The paper tackles the problem of 3D visual grounding by proposing a multi-branch collaborative learning network to improve both 3D referring expression comprehension and segmentation, achieving state-of-the-art performance with a 2.05% increase in Acc@0.5 for 3DREC and a 3.96% increase in mIoU for 3DRES.

3D referring expression comprehension (3DREC) and segmentation (3DRES) have overlapping objectives, indicating their potential for collaboration. However, existing collaborative approaches predominantly depend on the results of one task to make predictions for the other, limiting effective collaboration. We argue that employing separate branches for 3DREC and 3DRES tasks enhances the model's capacity to learn specific information for each task, enabling them to acquire complementary knowledge. Thus, we propose the MCLN framework, which includes independent branches for 3DREC and 3DRES tasks. This enables dedicated exploration of each task and effective coordination between the branches. Furthermore, to facilitate mutual reinforcement between these branches, we introduce a Relative Superpoint Aggregation (RSA) module and an Adaptive Soft Alignment (ASA) module. These modules significantly contribute to the precise alignment of prediction results from the two branches, directing the module to allocate increased attention to key positions. Comprehensive experimental evaluation demonstrates that our proposed method achieves state-of-the-art performance on both the 3DREC and 3DRES tasks, with an increase of 2.05% in Acc@0.5 for 3DREC and 3.96% in mIoU for 3DRES.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes