CVMar 11

When Slots Compete: Slot Merging in Object-Centric Learning

arXiv:2603.11246v110.9h-index: 21
Predicted impact top 65% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This addresses a specific bottleneck in object-centric learning for computer vision, but it is incremental as it builds on existing methods like DINOSAUR.

The paper tackles the problem of multiple slots competing for overlapping regions in slot-based object-centric learning by introducing slot merging, a lightweight operation that merges overlapping slots during training, which improves object factorization and mask quality, surpassing other adaptive methods in benchmarks.

Slot-based object-centric learning represents an image as a set of latent slots with a decoder that combines them into an image or features. The decoder specifies how slots are combined into an output, but the slot set is typically fixed: the number of slots is chosen upfront and slots are only refined. This can lead to multiple slots competing for overlapping regions of the same entity rather than focusing on distinct regions. We introduce slot merging: a drop-in, lightweight operation on the slot set that merges overlapping slots during training. We quantify overlap with a Soft-IoU score between slot-attention maps and combine selected pairs via a barycentric update that preserves gradient flow. Merging follows a fixed policy, with the decision threshold inferred from overlap statistics, requiring no additional learnable modules. Integrated into the established feature-reconstruction pipeline of DINOSAUR, the proposed method improves object factorization and mask quality, surpassing other adaptive methods in object discovery and segmentation benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes