CVMay 1

Pose-Aware Diffusion for 3D Generation

Zihan Zhou, Luxi Chen, Jingzhi Zhou, Yuhao Wan, Min Zhao, Baoyu Fan, Chongxuan Li

arXiv:2605.0034592.6

AI Analysis

This work addresses the challenge of pose-aligned 3D generation from single images, which is critical for applications like AR/VR and robotics.

PAD introduces an end-to-end diffusion framework that generates 3D geometry directly in observation space, using monocular depth unprojection as a geometric anchor to enforce spatial alignment. It achieves superior geometric alignment and image-to-3D correspondence compared to state-of-the-art methods, and extends to compositional scene reconstruction.

Generating pose-aligned 3D objects is challenging due to the spatial mismatches and transformation ambiguities inherent in decoupled canonical-then-rotate paradigms. To this end, we introduce Pose-Aware Diffusion (PAD), a novel end-to-end diffusion framework that synthesizes 3D geometry directly within the observation space. By unprojecting monocular depth into a partial point cloud and explicitly injecting it as a 3D geometric anchor, PAD abandons canonical assumptions to enforce rigorous spatial supervision. This native generation intrinsically resolves pose ambiguity, producing high-fidelity pose-aligned assets. Extensive experiments demonstrate that PAD achieves superior geometric alignment and image-to-3D correspondence compared to state-of-the-art methods. Additionally, PAD naturally extends to compositional 3D scene reconstruction via a simple union of independently generated objects, highlighting its robust ability to preserve precise spatial layouts.

View on arXiv PDF

Similar