FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain
This addresses the challenge of maintaining geometric fidelity and visual plausibility in 3D scene synthesis for autonomous driving applications, representing an incremental improvement over existing methods.
The paper tackles the problem of fusing geometry-based 3DGS and appearance-driven diffusion models for controllable driving-scene reconstruction and generation, achieving state-of-the-art performance on metrics like NTA-IoU, NTL-IoU, and FID with an FID of 107.47 at 6 meters lane shift.
In controllable driving-scene reconstruction and 3D scene generation, maintaining geometric fidelity while synthesizing visually plausible appearance under large viewpoint shifts is crucial. However, effective fusion of geometry-based 3DGS and appearance-driven diffusion models faces inherent challenges, as the absence of pixel-wise, 3D-consistent editing criteria often leads to over-restoration and geometric drift. To address these issues, we introduce \textbf{FaithFusion}, a 3DGS-diffusion fusion framework driven by pixel-wise Expected Information Gain (EIG). EIG acts as a unified policy for coherent spatio-temporal synthesis: it guides diffusion as a spatial prior to refine high-uncertainty regions, while its pixel-level weighting distills the edits back into 3DGS. The resulting plug-and-play system is free from extra prior conditions and structural modifications.Extensive experiments on the Waymo dataset demonstrate that our approach attains SOTA performance across NTA-IoU, NTL-IoU, and FID, maintaining an FID of 107.47 even at 6 meters lane shift. Our code is available at https://github.com/wangyuanbiubiubiu/FaithFusion.