CVMay 23, 2023

Sparse4D v2: Recurrent Temporal Fusion with Sparse Model

arXiv:2305.14018v2111 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses efficiency and performance in 3D detection for autonomous driving systems, representing an incremental improvement over previous methods.

The paper tackles the problem of improving multi-view temporal perception tasks by enhancing the temporal fusion module in Sparse4D with a recursive form of multi-frame feature sampling, achieving state-of-the-art results on the nuScenes 3D detection benchmark.

Sparse algorithms offer great flexibility for multi-view temporal perception tasks. In this paper, we present an enhanced version of Sparse4D, in which we improve the temporal fusion module by implementing a recursive form of multi-frame feature sampling. By effectively decoupling image features and structured anchor features, Sparse4D enables a highly efficient transformation of temporal features, thereby facilitating temporal fusion solely through the frame-by-frame transmission of sparse features. The recurrent temporal fusion approach provides two main benefits. Firstly, it reduces the computational complexity of temporal fusion from $O(T)$ to $O(1)$, resulting in significant improvements in inference speed and memory usage. Secondly, it enables the fusion of long-term information, leading to more pronounced performance improvements due to temporal fusion. Our proposed approach, Sparse4Dv2, further enhances the performance of the sparse perception algorithm and achieves state-of-the-art results on the nuScenes 3D detection benchmark. Code will be available at \url{https://github.com/linxuewu/Sparse4D}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes