CVFeb 27, 2025

Vector-Quantized Vision Foundation Models for Object-Centric Learning

arXiv:2502.20263v69 citationsh-index: 45Has CodeMM
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving object-centric learning for computer vision applications, representing an incremental advancement by unifying existing methods with shared quantization.

The paper tackles the problem of object-centric learning (OCL) struggling with complex object textures by proposing a unified architecture, VQ-VFM-OCL, which uses shared quantization of vision foundation model (VFM) representations for aggregation and decoding, resulting in consistent outperformance of baselines in object discovery, recognition, and downstream tasks.

Object-Centric Learning (OCL) aggregates image or video feature maps into object-level feature vectors, termed \textit{slots}. It's self-supervision of reconstructing the input from slots struggles with complex object textures, thus Vision Foundation Model (VFM) representations are used as the aggregation input and reconstruction target. Existing methods leverage VFM representations in diverse ways yet fail to fully exploit their potential. In response, we propose a unified architecture, Vector-Quantized VFMs for OCL (VQ-VFM-OCL, or VVO). The key to our unification is simply shared quantizing VFM representations in OCL aggregation and decoding. Experiments show that across different VFMs, aggregators and decoders, our VVO consistently outperforms baselines in object discovery and recognition, as well as downstream visual prediction and reasoning. We also mathematically analyze why VFM representations facilitate OCL aggregation and why their shared quantization as reconstruction targets strengthens OCL supervision. Our source code and model checkpoints are available on https://github.com/Genera1Z/VQ-VFM-OCL.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes