CVLGMMJun 9, 2025

VIVAT: Virtuous Improving VAE Training through Artifact Mitigation

arXiv:2506.07863v11 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses practical challenges for researchers and practitioners in generative computer vision by optimizing VAE training, though it is incremental as it builds on the existing KL-VAE framework without radical changes.

The paper tackled the problem of artifacts degrading reconstruction and generation quality in Variational Autoencoders (VAEs) by introducing VIVAT, a systematic approach that achieved state-of-the-art results in image reconstruction metrics like PSNR and SSIM across multiple benchmarks and improved text-to-image generation quality with superior CLIP scores.

Variational Autoencoders (VAEs) remain a cornerstone of generative computer vision, yet their training is often plagued by artifacts that degrade reconstruction and generation quality. This paper introduces VIVAT, a systematic approach to mitigating common artifacts in KL-VAE training without requiring radical architectural changes. We present a detailed taxonomy of five prevalent artifacts - color shift, grid patterns, blur, corner and droplet artifacts - and analyze their root causes. Through straightforward modifications, including adjustments to loss weights, padding strategies, and the integration of Spatially Conditional Normalization, we demonstrate significant improvements in VAE performance. Our method achieves state-of-the-art results in image reconstruction metrics (PSNR and SSIM) across multiple benchmarks and enhances text-to-image generation quality, as evidenced by superior CLIP scores. By preserving the simplicity of the KL-VAE framework while addressing its practical challenges, VIVAT offers actionable insights for researchers and practitioners aiming to optimize VAE training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes