CVNov 19, 2025

Jointly Conditioned Diffusion Model for Multi-View Pose-Guided Person Image Synthesis

arXiv:2511.15092v1
Originality Incremental advance
AI Analysis

This work improves multi-view person image synthesis for applications like virtual try-on or animation, but it is incremental as it builds on existing diffusion models with targeted modifications.

The paper tackles the problem of pose-guided human image generation by addressing incomplete textures from single reference views and lack of cross-view interaction, resulting in state-of-the-art fidelity and cross-view consistency.

Pose-guided human image generation is limited by incomplete textures from single reference views and the absence of explicit cross-view interaction. We present jointly conditioned diffusion model (JCDM), a jointly conditioned diffusion framework that exploits multi-view priors. The appearance prior module (APM) infers a holistic identity preserving prior from incomplete references, and the joint conditional injection (JCI) mechanism fuses multi-view cues and injects shared conditioning into the denoising backbone to align identity, color, and texture across poses. JCDM supports a variable number of reference views and integrates with standard diffusion backbones with minimal and targeted architectural modifications. Experiments demonstrate state of the art fidelity and cross-view consistency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes