CVAILGRONov 21, 2022

Mean Shift Mask Transformer for Unseen Object Instance Segmentation

arXiv:2211.11679v329 citationsh-index: 31
Originality Incremental advance
AI Analysis

This work addresses the challenge of enabling robots to grasp and manipulate unseen objects, representing an incremental improvement in domain-specific instance segmentation.

The paper tackles the problem of segmenting unseen objects in images for robot manipulation by proposing the Mean Shift Mask Transformer (MSMFormer), which integrates a differentiable mean shift clustering algorithm into an end-to-end neural network, achieving competitive performance compared to state-of-the-art methods.

Segmenting unseen objects from images is a critical perception skill that a robot needs to acquire. In robot manipulation, it can facilitate a robot to grasp and manipulate unseen objects. Mean shift clustering is a widely used method for image segmentation tasks. However, the traditional mean shift clustering algorithm is not differentiable, making it difficult to integrate it into an end-to-end neural network training framework. In this work, we propose the Mean Shift Mask Transformer (MSMFormer), a new transformer architecture that simulates the von Mises-Fisher (vMF) mean shift clustering algorithm, allowing for the joint training and inference of both the feature extractor and the clustering. Its central component is a hypersphere attention mechanism, which updates object queries on a hypersphere. To illustrate the effectiveness of our method, we apply MSMFormer to unseen object instance segmentation. Our experiments show that MSMFormer achieves competitive performance compared to state-of-the-art methods for unseen object instance segmentation. The project page, appendix, video, and code are available at https://irvlutd.github.io/MSMFormer

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes