Balanced 3DGS: Gaussian-wise Parallelism Rendering with Fine-Grained Tiling
This addresses training inefficiency for researchers and practitioners using 3DGS, though it is incremental as it optimizes an existing method.
The paper tackles the problem of load imbalance in 3D Gaussian Splatting (3DGS) training, which causes slow rendering, and introduces Balanced 3DGS with techniques like Gaussian-wise parallelism and fine-grained tiling, achieving up to 7.52x speedup in forward renderCUDA kernel performance.
3D Gaussian Splatting (3DGS) is increasingly attracting attention in both academia and industry owing to its superior visual quality and rendering speed. However, training a 3DGS model remains a time-intensive task, especially in load imbalance scenarios where workload diversity among pixels and Gaussian spheres causes poor renderCUDA kernel performance. We introduce Balanced 3DGS, a Gaussian-wise parallelism rendering with fine-grained tiling approach in 3DGS training process, perfectly solving load-imbalance issues. First, we innovatively introduce the inter-block dynamic workload distribution technique to map workloads to Streaming Multiprocessor(SM) resources within a single GPU dynamically, which constitutes the foundation of load balancing. Second, we are the first to propose the Gaussian-wise parallel rendering technique to significantly reduce workload divergence inside a warp, which serves as a critical component in addressing load imbalance. Based on the above two methods, we further creatively put forward the fine-grained combined load balancing technique to uniformly distribute workload across all SMs, which boosts the forward renderCUDA kernel performance by up to 7.52x. Besides, we present a self-adaptive render kernel selection strategy during the 3DGS training process based on different load-balance situations, which effectively improves training efficiency.