LG CVJan 30, 2023

Unlocking Slot Attention by Changing Optimal Transport Costs

Yan Zhang, David W. Zhang, Simon Lacoste-Julien, Gertjan J. Burghouts, Cees G. M. Snoek

arXiv:2301.13197v217.021 citationsh-index: 67Has Code

Originality Incremental advance

AI Analysis

This is an incremental improvement for object-centric modeling in computer vision.

The paper tackled slot attention's inability to handle videos with dynamic object counts due to set-equivariance, by connecting it to optimal transport and proposing MESH, which improved performance on object-centric learning benchmarks.

Slot attention is a powerful method for object-centric modeling in images and videos. However, its set-equivariance limits its ability to handle videos with a dynamic number of objects because it cannot break ties. To overcome this limitation, we first establish a connection between slot attention and optimal transport. Based on this new perspective we propose MESH (Minimize Entropy of Sinkhorn): a cross-attention module that combines the tiebreaking properties of unregularized optimal transport with the speed of regularized optimal transport. We evaluate slot attention using MESH on multiple object-centric learning benchmarks and find significant improvements over slot attention in every setting.

View on arXiv PDF Code

Similar