CVJun 11

JointEdit3D: Feed-Forward 3D Scene Editing in a Unified Latent Space

arXiv:2606.13345v18.8
Predicted impact top 54% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the high test-time cost and structural inconsistencies in 3D scene editing for computer vision researchers, offering a feed-forward alternative to per-scene optimization.

JointEdit3D introduces a feed-forward 3D scene editing method using a unified latent space for RGB and geometry, achieving improved edited-region quality and 3D structural completeness over prior baselines while maintaining background preservation. The method is evaluated on a new dataset of 15K paired editing samples.

Existing 3D scene editing methods typically rely on per-scene optimization over explicit 3D representations or cascaded edit-and-reconstruct pipelines, resulting in high test-time cost, limited 3D awareness, and structural inconsistencies. To couple appearance synthesis and geometry prediction during editing, we build on a unified RGB-geometry reconstruction-generation latent space and adapt it to feed-forward 3D scene editing. The resulting framework, \textbf{JointEdit3D}, performs asymmetric latent inpainting by observing only a single edited RGB reference latent and generating the remaining RGB views and edited geometry latent under source-scene anchoring. JointEdit3D introduces a dedicated SceneAnchor Branch to inject source-scene structure without forcing direct copying, and adopts edit/background-aware losses to balance edited-region fidelity with unedited-content preservation. To address the lack of paired resources for standardized 3D scene editing evaluation, we introduce SceneEdit3D-15K, a dataset with 15K paired editing samples and renderer-provided 3D annotations, together with SceneEdit3D-Bench, a curated 100-sample benchmark. Experiments show that JointEdit3D improves edited-region quality and 3D structural completeness over prior baselines while maintaining competitive background preservation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes