CVJun 5, 2025

S2GO: Streaming Sparse Gaussian Occupancy Prediction

Berkeley
arXiv:2506.05473v12 citationsh-index: 9
AI Analysis

This addresses the problem of slow and rigid 3D perception for autonomous driving systems, offering a significant speed and accuracy improvement over existing methods.

The paper tackles the inefficiency and inflexibility of dense 3D representations in occupancy prediction by introducing S2GO, a streaming sparse Gaussian method that uses compact 3D queries propagated over time. It achieves state-of-the-art performance, outperforming prior methods by 1.5 IoU with 5.9x faster inference on nuScenes and KITTI benchmarks.

Despite the demonstrated efficiency and performance of sparse query-based representations for perception, state-of-the-art 3D occupancy prediction methods still rely on voxel-based or dense Gaussian-based 3D representations. However, dense representations are slow, and they lack flexibility in capturing the temporal dynamics of driving scenes. Distinct from prior work, we instead summarize the scene into a compact set of 3D queries which are propagated through time in an online, streaming fashion. These queries are then decoded into semantic Gaussians at each timestep. We couple our framework with a denoising rendering objective to guide the queries and their constituent Gaussians in effectively capturing scene geometry. Owing to its efficient, query-based representation, S2GO achieves state-of-the-art performance on the nuScenes and KITTI occupancy benchmarks, outperforming prior art (e.g., GaussianWorld) by 1.5 IoU with 5.9x faster inference.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes