Deflickering Vision-Based Occupancy Networks through Lightweight Spatio-Temporal Correlation
This addresses flickering issues that compromise visual experience and decision-making in autonomous driving, representing an incremental improvement over existing methods.
The paper tackled the problem of temporal inconsistencies causing flickering effects in vision-based occupancy networks for autonomous driving, and the result was a plugin framework that eliminated flickering artifacts with negligible computational overhead.
Vision-based occupancy networks (VONs) provide an end-to-end solution for reconstructing 3D environments in autonomous driving. However, existing methods often suffer from temporal inconsistencies, manifesting as flickering effects that compromise visual experience and adversely affect decision-making. While recent approaches have incorporated historical data to mitigate the issue, they often incur high computational costs and may introduce noisy information that interferes with object detection. We propose OccLinker, a novel plugin framework designed to seamlessly integrate with existing VONs for boosting performance. Our method efficiently consolidates historical static and motion cues, learns sparse latent correlations with current features through a dual cross-attention mechanism, and produces correction occupancy components to refine the base network's predictions. We propose a new temporal consistency metric to quantitatively identify flickering effects. Extensive experiments on two benchmark datasets demonstrate that our method delivers superior performance with negligible computational overhead, while effectively eliminating flickering artifacts.