CVMay 1

Pose-Aware Diffusion for 3D Generation

arXiv:2605.0034592.6
AI Analysis

This work addresses the challenge of pose-aligned 3D generation from single images, which is critical for applications like AR/VR and robotics.

PAD introduces an end-to-end diffusion framework that generates 3D geometry directly in observation space, using monocular depth unprojection as a geometric anchor to enforce spatial alignment. It achieves superior geometric alignment and image-to-3D correspondence compared to state-of-the-art methods, and extends to compositional scene reconstruction.

Generating pose-aligned 3D objects is challenging due to the spatial mismatches and transformation ambiguities inherent in decoupled canonical-then-rotate paradigms. To this end, we introduce Pose-Aware Diffusion (PAD), a novel end-to-end diffusion framework that synthesizes 3D geometry directly within the observation space. By unprojecting monocular depth into a partial point cloud and explicitly injecting it as a 3D geometric anchor, PAD abandons canonical assumptions to enforce rigorous spatial supervision. This native generation intrinsically resolves pose ambiguity, producing high-fidelity pose-aligned assets. Extensive experiments demonstrate that PAD achieves superior geometric alignment and image-to-3D correspondence compared to state-of-the-art methods. Additionally, PAD naturally extends to compositional 3D scene reconstruction via a simple union of independently generated objects, highlighting its robust ability to preserve precise spatial layouts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes