Drop-In Perceptual Optimization for 3D Gaussian Splatting
This work addresses the issue of poor perceptual quality in 3DGS renderings for human viewers, offering a drop-in optimization method that enhances visual fidelity without increasing computational cost, with applications in scene compression.
The paper tackled the problem of blurry renderings in 3D Gaussian Splatting (3DGS) by systematically exploring perceptual optimization strategies, finding that a regularized Wasserstein Distortion (WD-R) loss improves perceptual quality, with human raters preferring it 2.3x over the original loss and achieving state-of-the-art scores on metrics like LPIPS and FID.
Despite their output being ultimately consumed by human viewers, 3D Gaussian Splatting (3DGS) methods often rely on ad-hoc combinations of pixel-level losses, resulting in blurry renderings. To address this, we systematically explore perceptual optimization strategies for 3DGS by searching over a diverse set of distortion losses. We conduct the first-of-its-kind large-scale human subjective study on 3DGS, involving 39,320 pairwise ratings across several datasets and 3DGS frameworks. A regularized version of Wasserstein Distortion, which we call WD-R, emerges as the clear winner, excelling at recovering fine textures without incurring a higher splat count. WD-R is preferred by raters more than $2.3\times$ over the original 3DGS loss, and $1.5\times$ over current best method Perceptual-GS. WD-R also consistently achieves state-of-the-art LPIPS, DISTS, and FID scores across various datasets, and generalizes across recent frameworks, such as Mip-Splatting and Scaffold-GS, where replacing the original loss with WD-R consistently enhances perceptual quality within a similar resource budget (number of splats for Mip-Splatting, model size for Scaffold-GS), and leads to reconstructions being preferred by human raters $1.8\times$ and $3.6\times$, respectively. We also find that this carries over to the task of 3DGS scene compression, with $\approx 50\%$ bitrate savings for comparable perceptual metric performance.