Exploring the Role of the Bottleneck in Slot-Based Models Through Covariance Regularization
This work addresses a specific architectural limitation in slot-based models for computer vision, but it is incremental as it only partially closes the performance gap.
The researchers tackled the performance gap between slot-based models using image reconstruction versus feature reconstruction objectives by proposing covariance regularization to constrain the bottleneck, allowing larger encoders without degenerate masks. Their method improved over baseline Slot Attention but did not match the performance of the state-of-the-art method on COCO2017.
In this project we attempt to make slot-based models with an image reconstruction objective competitive with those that use a feature reconstruction objective on real world datasets. We propose a loss-based approach to constricting the bottleneck of slot-based models, allowing larger-capacity encoder networks to be used with Slot Attention without producing degenerate stripe-shaped masks. We find that our proposed method offers an improvement over the baseline Slot Attention model but does not reach the performance of \dinosaur on the COCO2017 dataset. Throughout this project, we confirm the superiority of a feature reconstruction objective over an image reconstruction objective and explore the role of the architectural bottleneck in slot-based models.