CVJul 1, 2024

Grouped Discrete Representation Guides Object-Centric Learning

Rongzhen Zhao, Vivienne Wang, Juho Kannala, Joni Pajarinen

arXiv:2407.01726v33.72 citationsh-index: 45

Originality Incremental advance

AI Analysis

This work addresses a specific bottleneck in object-centric learning for computer vision, offering an incremental improvement over existing transformer-based methods.

The paper tackles the problem of object-centric learning (OCL) by addressing limitations in discrete representation methods, which overlook feature attributes and lose commonalities, leading to poor generalization and convergence. The proposed Grouped Discrete Representation (GDR) groups features into attributes and uses tuple indexing, resulting in consistent improvements in convergence and generalizability across various experiments.

Similar to humans perceiving visual scenes as objects, Object-Centric Learning (OCL) can abstract dense images or videos into sparse object-level features. Transformer-based OCL handles complex textures well due to the decoding guidance of discrete representation, obtained by discretizing noisy features in image or video feature maps using template features from a codebook. However, treating features as minimal units overlooks their composing attributes, thus impeding model generalization; indexing features with natural numbers loses attribute-level commonalities and characteristics, thus diminishing heuristics for model convergence. We propose \textit{Grouped Discrete Representation} (GDR) to address these issues by grouping features into attributes and indexing them with tuple numbers. In extensive experiments across different query initializations, dataset modalities, and model architectures, GDR consistently improves convergence and generalizability. Visualizations show that our method effectively captures attribute-level information in features. The source code will be available upon acceptance.

View on arXiv PDF

Similar