LGCVSDASNov 4, 2024

Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Peking U
arXiv:2411.02038v375 citationsh-index: 9Has Code
Originality Incremental advance
AI Analysis

This addresses a key bottleneck in unsupervised learning for researchers and practitioners, offering a simple solution to enhance scalability, though it is incremental as it builds on existing VQ methods.

The paper tackled representation collapse in Vector Quantized models by proposing SimVQ, a method that reparameterizes code vectors using a learnable linear layer, which improved codebook usage and generalized across image and audio tasks.

Vector Quantization (VQ) is essential for discretizing continuous representations in unsupervised learning but suffers from representation collapse, causing low codebook utilization and limiting scalability. Existing solutions often rely on complex optimizations or reduce latent dimensionality, which compromises model capacity and fails to fully solve the problem. We identify the root cause as disjoint codebook optimization, where only a few code vectors are updated via gradient descent. To fix this, we propose \textbf{Sim}ple\textbf{VQ}, which reparameterizes code vectors through a learnable linear transformation layer over a latent basis, optimizing the \textit{entire linear space} rather than nearest \textit{individual code vectors}. Although the multiplication of two linear matrices is equivalent to applying a single linear layer, this simple approach effectively prevents collapse. Extensive experiments on image and audio tasks demonstrate that SimVQ improves codebook usage, is easy to implement, and generalizes well across modalities and architectures. The code is available at https://github.com/youngsheen/SimVQ.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes