CVJul 11, 2025

From One to More: Contextual Part Latents for 3D Generation

arXiv:2507.08772v220 citationsh-index: 9
Originality Incremental advance
AI Analysis

This work addresses the problem of fine-grained controllability in 3D generation for applications like compositional design, though it appears incremental as it builds on existing diffusion frameworks.

The paper tackles the problem of 3D generation by addressing limitations in single-latent representations that fail to capture complex multi-part geometries, proposing CoPart, a part-aware diffusion framework that decomposes 3D objects into contextual part latents, which demonstrates superior capabilities in part-level editing, articulated object generation, and scene composition.

Recent advances in 3D generation have transitioned from multi-view 2D rendering approaches to 3D-native latent diffusion frameworks that exploit geometric priors in ground truth data. Despite progress, three key limitations persist: (1) Single-latent representations fail to capture complex multi-part geometries, causing detail degradation; (2) Holistic latent coding neglects part independence and interrelationships critical for compositional design; (3) Global conditioning mechanisms lack fine-grained controllability. Inspired by human 3D design workflows, we propose CoPart - a part-aware diffusion framework that decomposes 3D objects into contextual part latents for coherent multi-part generation. This paradigm offers three advantages: i) Reduces encoding complexity through part decomposition; ii) Enables explicit part relationship modeling; iii) Supports part-level conditioning. We further develop a mutual guidance strategy to fine-tune pre-trained diffusion models for joint part latent denoising, ensuring both geometric coherence and foundation model priors. To enable large-scale training, we construct Partverse - a novel 3D part dataset derived from Objaverse through automated mesh segmentation and human-verified annotations. Extensive experiments demonstrate CoPart's superior capabilities in part-level editing, articulated object generation, and scene composition with unprecedented controllability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes