LGCVJan 30, 2023

Unlocking Slot Attention by Changing Optimal Transport Costs

arXiv:2301.13197v221 citationsh-index: 67
Originality Incremental advance
AI Analysis

This is an incremental improvement for object-centric modeling in computer vision.

The paper tackled slot attention's inability to handle videos with dynamic object counts due to set-equivariance, by connecting it to optimal transport and proposing MESH, which improved performance on object-centric learning benchmarks.

Slot attention is a powerful method for object-centric modeling in images and videos. However, its set-equivariance limits its ability to handle videos with a dynamic number of objects because it cannot break ties. To overcome this limitation, we first establish a connection between slot attention and optimal transport. Based on this new perspective we propose MESH (Minimize Entropy of Sinkhorn): a cross-attention module that combines the tiebreaking properties of unregularized optimal transport with the speed of regularized optimal transport. We evaluate slot attention using MESH on multiple object-centric learning benchmarks and find significant improvements over slot attention in every setting.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes