ARGRApr 2

GEMM-GS: Accelerating 3D Gaussian Splatting on Tensor Cores with GEMM-Compatible Blending

arXiv:2604.0212028.9Has Code
Predicted impact top 5% in AR · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses rendering latency for real-time applications in computer vision and graphics, but it is incremental as it optimizes an existing method without changing the fundamental approach.

The paper tackles the problem of slow rendering in 3D Gaussian Splatting (3DGS) for 3D scene reconstruction by reformulating the blending process to be compatible with GPU Tensor Cores, achieving a 1.42x speedup over vanilla 3DGS and an additional 1.47x speedup when combined with other methods.

Neural Radiance Fields (NeRF) enables 3D scene reconstruction from several 2D images but incurs high rendering latency via its point-sampling design. 3D Gaussian Splatting (3DGS) improves on NeRF with explicit scene representation and an optimized pipeline yet still fails to meet practical real-time demands. Existing acceleration works overlook the evolving Tensor Cores of modern GPUs because 3DGS pipeline lacks General Matrix Multiplication (GEMM) operations. This paper proposes GEMM-GS, an acceleration approach utilizing tensor cores on GPUs via GEMM-friendly blending transformation. It equivalently reformulates the 3DGS blending process into a GEMM-compatible form to utilize Tensor Cores. A high-performance CUDA kernel is designed, integrating a three-stage double-buffered pipeline that overlaps computation and memory access. Extensive experiments show that GEMM-GS achieves $1.42\times$ speedup over vanilla 3DGS and provides an additional $1.47\times$ speedup on average when combining with existing acceleration approaches. Code is released at https://github.com/shieldforever/GEMM-GS.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes