LCUDiff: Latent Capacity Upgrade Diffusion for Faithful Human Body Restoration
This work addresses the bottleneck of variational autoencoders in diffusion-based restoration for human-centric images, offering an incremental improvement for applications like photo enhancement or medical imaging.
The paper tackles the problem of insufficient fidelity in human body restoration from degraded images by proposing LCUDiff, a framework that upgrades a pre-trained latent diffusion model from a 4-channel to a 16-channel latent space, resulting in competitive results with higher fidelity and fewer artifacts under mild degradations while preserving one-step efficiency.
Existing methods for restoring degraded human-centric images often struggle with insufficient fidelity, particularly in human body restoration (HBR). Recent diffusion-based restoration methods commonly adapt pre-trained text-to-image diffusion models, where the variational autoencoder (VAE) can significantly bottleneck restoration fidelity. We propose LCUDiff, a stable one-step framework that upgrades a pre-trained latent diffusion model from the 4-channel latent space to the 16-channel latent space. For VAE fine-tuning, channel splitting distillation (CSD) is used to keep the first four channels aligned with pre-trained priors while allocating the additional channels to effectively encode high-frequency details. We further design prior-preserving adaptation (PPA) to smoothly bridge the mismatch between 4-channel diffusion backbones and the higher-dimensional 16-channel latent. In addition, we propose a decoder router (DeR) for per-sample decoder routing using restoration-quality score annotations, which improves visual quality across diverse conditions. Experiments on synthetic and real-world datasets show competitive results with higher fidelity and fewer artifacts under mild degradations, while preserving one-step efficiency. The code and model will be at https://github.com/gobunu/LCUDiff.