CV AI LGSep 26, 2022

LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN Latent Space

Xuekun Zhao, Pu Cao, Xiaoya Yang, Mingjian Zhang, Lu Yang, Qing Song

arXiv:2209.12746v35.77 citationsh-index: 77Has Code

Originality Highly original

AI Analysis

This addresses a key bottleneck in image editing and generation for researchers and practitioners, offering a unified solution that enhances inversion outcomes.

The paper tackles the challenge of improving both reconstruction fidelity and perception/editability in GAN image inversion by proposing LSAP, a paradigm that aligns inverted latent codes with the synthetic distribution, achieving state-of-the-art performance across domains.

As research on image inversion advances, the process is generally divided into two stages. The first step is Image Embedding, involves using an encoder or optimization procedure to embed an image and obtain its corresponding latent code. The second stage, referred to as Result Refinement, further improves the inversion and editing outcomes. Although this refinement stage substantially enhances reconstruction fidelity, perception and editability remain largely unchanged and are highly dependent on the latent codes derived from the first stage. Therefore, a key challenge lies in obtaining latent codes that preserve reconstruction fidelity while simultaneously improving perception and editability. In this work, we first reveal that these two properties are closely related to the degree of alignment (or disalignment) between the inverted latent codes and the synthetic distribution. Based on this insight, we propose the \textbf{ Latent Space Alignment Inversion Paradigm (LSAP)}, which integrates both an evaluation metric and a unified inversion solution. Specifically, we introduce the \textbf{Normalized Style Space ($\mathcal{S^N}$ space)} and \textbf{Normalized Style Space Cosine Distance (NSCD)} to quantify the disalignment of inversion methods. Moreover, our paradigm can be optimized for both encoder-based and optimization-based embeddings, providing a consistent alignment framework. Extensive experiments across various domains demonstrate that NSCD effectively captures perceptual and editable characteristics, and that our alignment paradigm achieves state-of-the-art performance in both stages of inversion.

View on arXiv PDF Code

Similar