Advancing 3D Gaussian Splatting Editing with Complementary and Consensus Information
This work addresses visual fidelity and consistency issues in 3D scene editing for applications like computer graphics and virtual reality, representing an incremental improvement over existing approaches.
The paper tackles inconsistent geometric reconstructions and over-texture artifacts in text-guided 3D Gaussian Splatting editing by introducing a complementary information mutual learning network and a wavelet consensus attention mechanism, resulting in superior rendering quality and view consistency compared to state-of-the-art methods.
We present a novel framework for enhancing the visual fidelity and consistency of text-guided 3D Gaussian Splatting (3DGS) editing. Existing editing approaches face two critical challenges: inconsistent geometric reconstructions across multiple viewpoints, particularly in challenging camera positions, and ineffective utilization of depth information during image manipulation, resulting in over-texture artifacts and degraded object boundaries. To address these limitations, we introduce: 1) A complementary information mutual learning network that enhances depth map estimation from 3DGS, enabling precise depth-conditioned 3D editing while preserving geometric structures. 2) A wavelet consensus attention mechanism that effectively aligns latent codes during the diffusion denoising process, ensuring multi-view consistency in the edited results. Through extensive experimentation, our method demonstrates superior performance in rendering quality and view consistency compared to state-of-the-art approaches. The results validate our framework as an effective solution for text-guided editing of 3D scenes.