CVNov 4, 2024

MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation

arXiv:2411.01781v39 citationsh-index: 12MM
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck in 3D instance segmentation for applications like robotics and AR/VR, representing an incremental improvement over existing transformer-based methods.

The paper tackles the over-segmentation problem in transformer-based 3D instance segmentation, especially for large objects, by proposing MSTA3D with multi-scale features and twin-attention, achieving state-of-the-art results on ScanNetV2, ScanNet200, and S3DIS datasets.

Recently, transformer-based techniques incorporating superpoints have become prevalent in 3D instance segmentation. However, they often encounter an over-segmentation problem, especially noticeable with large objects. Additionally, unreliable mask predictions stemming from superpoint mask prediction further compound this issue. To address these challenges, we propose a novel framework called MSTA3D. It leverages multi-scale feature representation and introduces a twin-attention mechanism to effectively capture them. Furthermore, MSTA3D integrates a box query with a box regularizer, offering a complementary spatial constraint alongside semantic queries. Experimental evaluations on ScanNetV2, ScanNet200 and S3DIS datasets demonstrate that our approach surpasses state-of-the-art 3D instance segmentation methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes