CVNov 12, 2024

ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions

arXiv:2411.07725v215 citationsh-index: 17
Originality Incremental advance
AI Analysis

This work addresses spatiotemporal scene understanding for autonomous systems, presenting incremental improvements to existing frameworks.

The paper tackles 3D semantic occupancy and flow prediction by proposing a vision-based framework with three improvements: an occlusion-aware adaptive lifting mechanism with depth denoising, 3D-2D semantic consistency enforcement via optimized prototypes, and a BEV-centric cost volume for joint prediction. The method achieves new state-of-the-art performance on multiple benchmarks and offers a real-time version that exceeds existing real-time methods in speed and accuracy.

3D semantic occupancy and flow prediction are fundamental to spatiotemporal scene understanding. This paper proposes a vision-based framework with three targeted improvements. First, we introduce an occlusion-aware adaptive lifting mechanism incorporating depth denoising. This enhances the robustness of 2D-to-3D feature transformation while mitigating reliance on depth priors. Second, we enforce 3D-2D semantic consistency via jointly optimized prototypes, using confidence- and category-aware sampling to address the long-tail classes problem. Third, to streamline joint prediction, we devise a BEV-centric cost volume to explicitly correlate semantic and flow features, supervised by a hybrid classification-regression scheme that handles diverse motion scales. Our purely convolutional architecture establishes new SOTA performance on multiple benchmarks for both semantic occupancy and joint occupancy semantic-flow prediction. We also present a family of models offering a spectrum of efficiency-performance trade-offs. Our real-time version exceeds all existing real-time methods in speed and accuracy, ensuring its practical viability.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes