CVMar 16

SNCE: Geometry-Aware Supervision for Scalable Discrete Image Generation

arXiv:2603.1515095.7h-index: 17
AI Analysis

This addresses optimization issues in discrete image generation for researchers and practitioners, offering a method to enhance training efficiency and quality, though it is incremental as it builds on existing VQ-based approaches.

The paper tackled the challenge of training generative models with large vector quantization codebooks, which typically require larger models and longer training, by proposing Stochastic Neighbor Cross Entropy Minimization (SNCE). The result was significantly improved convergence speed and generation quality across tasks like ImageNet-256 generation and text-to-image synthesis.

Recent advancements in discrete image generation showed that scaling the VQ codebook size significantly improves reconstruction fidelity. However, training generative models with a large VQ codebook remains challenging, typically requiring larger model size and a longer training schedule. In this work, we propose Stochastic Neighbor Cross Entropy Minimization (SNCE), a novel training objective designed to address the optimization challenges of large-codebook discrete image generators. Instead of supervising the model with a hard one-hot target, SNCE constructs a soft categorical distribution over a set of neighboring tokens. The probability assigned to each token is proportional to the proximity between its code embedding and the ground-truth image embedding, encouraging the model to capture semantically meaningful geometric structure in the quantized embedding space. We conduct extensive experiments across class-conditional ImageNet-256 generation, large-scale text-to-image synthesis, and image editing tasks. Results show that SNCE significantly improves convergence speed and overall generation quality compared to standard cross-entropy objectives.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes