CVJan 21, 2025

SMamba: Sparse Mamba for Event-based Object Detection

arXiv:2501.11971v123 citationsh-index: 6AAAI
Originality Incremental advance
AI Analysis

This work addresses efficiency and accuracy trade-offs in event-based object detection, which is important for real-time applications like robotics and autonomous vehicles, though it is incremental as it builds on existing Mamba and sparsification methods.

The paper tackled the problem of high computational overhead in transformer-based event-based object detection by proposing SMamba, which uses adaptive sparsification to reduce computation while maintaining global modeling, achieving state-of-the-art performance and efficiency on three datasets.

Transformer-based methods have achieved remarkable performance in event-based object detection, owing to the global modeling ability. However, they neglect the influence of non-event and noisy regions and process them uniformly, leading to high computational overhead. To mitigate computation cost, some researchers propose window attention based sparsification strategies to discard unimportant regions, which sacrifices the global modeling ability and results in suboptimal performance. To achieve better trade-off between accuracy and efficiency, we propose Sparse Mamba (SMamba), which performs adaptive sparsification to reduce computational effort while maintaining global modeling capability. Specifically, a Spatio-Temporal Continuity Assessment module is proposed to measure the information content of tokens and discard uninformative ones by leveraging the spatiotemporal distribution differences between activity and noise events. Based on the assessment results, an Information-Prioritized Local Scan strategy is designed to shorten the scan distance between high-information tokens, facilitating interactions among them in the spatial dimension. Furthermore, to extend the global interaction from 2D space to 3D representations, a Global Channel Interaction module is proposed to aggregate channel information from a global spatial perspective. Results on three datasets (Gen1, 1Mpx, and eTram) demonstrate that our model outperforms other methods in both performance and efficiency.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes