CVAIJul 12, 2025

AlphaVAE: Unified End-to-End RGBA Image Reconstruction and Generation with Alpha-Aware Representation Learning

arXiv:2507.09308v16 citationsh-index: 3Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of RGBA image generation for applications requiring transparent content, representing an incremental advance by extending existing RGB methods to four-channel images.

The paper tackles the problem of generating transparent or layered RGBA images, which lacks large-scale benchmarks, by introducing ALPHA, the first comprehensive RGBA benchmark, and ALPHAVAE, a unified end-to-end RGBA VAE that achieves a +4.9 dB improvement in PSNR and a +3.2% increase in SSIM over prior methods in reconstruction.

Recent advances in latent diffusion models have achieved remarkable results in high-fidelity RGB image synthesis by leveraging pretrained VAEs to compress and reconstruct pixel data at low computational cost. However, the generation of transparent or layered content (RGBA image) remains largely unexplored, due to the lack of large-scale benchmarks. In this work, we propose ALPHA, the first comprehensive RGBA benchmark that adapts standard RGB metrics to four-channel images via alpha blending over canonical backgrounds. We further introduce ALPHAVAE, a unified end-to-end RGBA VAE that extends a pretrained RGB VAE by incorporating a dedicated alpha channel. The model is trained with a composite objective that combines alpha-blended pixel reconstruction, patch-level fidelity, perceptual consistency, and dual KL divergence constraints to ensure latent fidelity across both RGB and alpha representations. Our RGBA VAE, trained on only 8K images in contrast to 1M used by prior methods, achieves a +4.9 dB improvement in PSNR and a +3.2% increase in SSIM over LayerDiffuse in reconstruction. It also enables superior transparent image generation when fine-tuned within a latent diffusion framework. Our code, data, and models are released on https://github.com/o0o0o00o0/AlphaVAE for reproducibility.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes