CV AIMar 30, 2025

GMapLatent: Geometric Mapping in Latent Space

Wei Zeng, Xuebin Chang, Jianghao Su, Xiang Gu, Jian Sun, Zongben Xu

arXiv:2503.23407v16.21 citationsh-index: 11

Originality Incremental advance

AI Analysis

This work addresses a specific bottleneck in cross-domain image generation for researchers and practitioners in computer vision, though it appears incremental as it builds on existing encoder-decoder architectures.

The paper tackles the problem of mode collapse and mixture in cross-domain generative models by introducing a geometric mapping method in latent space to align cross-domain latent spaces with strict cluster correspondence, resulting in superior performance over existing models on gray-scale and color images.

Cross-domain generative models based on encoder-decoder AI architectures have attracted much attention in generating realistic images, where domain alignment is crucial for generation accuracy. Domain alignment methods usually deal directly with the initial distribution; however, mismatched or mixed clusters can lead to mode collapse and mixture problems in the decoder, compromising model generalization capabilities. In this work, we innovate a cross-domain alignment and generation model that introduces a canonical latent space representation based on geometric mapping to align the cross-domain latent spaces in a rigorous and precise manner, thus avoiding mode collapse and mixture in the encoder-decoder generation architectures. We name this model GMapLatent. The core of the method is to seamlessly align latent spaces with strict cluster correspondence constraints using the canonical parameterizations of cluster-decorated latent spaces. We first (1) transform the latent space to a canonical parameter domain by composing barycenter translation, optimal transport merging and constrained harmonic mapping, and then (2) compute geometric registration with cluster constraints over the canonical parameter domains. This process realizes a bijective (one-to-one and onto) mapping between newly transformed latent spaces and generates a precise alignment of cluster pairs. Cross-domain generation is then achieved through the aligned latent spaces embedded in the encoder-decoder pipeline. Experiments on gray-scale and color images validate the efficiency, efficacy and applicability of GMapLatent, and demonstrate that the proposed model has superior performance over existing models.

View on arXiv PDF

Similar