CVMay 26, 2022

Unsupervised Multi-object Segmentation Using Attention and Soft-argmax

arXiv:2205.13271v212.213 citationsh-index: 30Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of unsupervised object-centric learning for multi-object detection and segmentation, which is incremental as it builds on existing methods with novel architectural components.

The paper tackles unsupervised multi-object segmentation by introducing an architecture that uses attention and soft-argmax to predict object coordinates and features, with a transformer encoder for handling occlusions and a convolutional autoencoder for background reconstruction, achieving significant state-of-the-art improvements on complex synthetic benchmarks.

We introduce a new architecture for unsupervised object-centric representation learning and multi-object detection and segmentation, which uses a translation-equivariant attention mechanism to predict the coordinates of the objects present in the scene and to associate a feature vector to each object. A transformer encoder handles occlusions and redundant detections, and a convolutional autoencoder is in charge of background reconstruction. We show that this architecture significantly outperforms the state of the art on complex synthetic benchmarks.

View on arXiv PDF Code

Similar