CVSep 25, 2024

PASS: Path-selective State Space Model for Event-based Recognition

Jiazhou Zhou, Kanghao Chen, Lei Zhang, Lin Wang

arXiv:2409.16953v22.01 citationsh-index: 23

Originality Incremental advance

AI Analysis

This work improves event-based recognition for applications like robotics and surveillance by enabling better generalization across different event lengths and frequencies, though it is incremental as it builds on state space models with novel modules.

The paper tackles the problem of event-based object/action recognition by addressing limitations in existing methods that process events at fixed intervals, leading to poor generalization across temporal frequencies. The proposed PASS framework achieves superior performance on five public datasets, with a smaller accuracy drop of -8.62% compared to -20.69% for baselines when generalizing across varying inference frequencies.

Event cameras are bio-inspired sensors that capture intensity changes asynchronously with distinct advantages, such as high temporal resolution. Existing methods for event-based object/action recognition predominantly sample and convert event representation at every fixed temporal interval (or frequency). However, they are constrained to processing a limited number of event lengths and show poor frequency generalization, thus not fully leveraging the event's high temporal resolution. In this paper, we present our PASS framework, exhibiting superior capacity for spatiotemporal event modeling towards a larger number of event lengths and generalization across varying inference temporal frequencies. Our key insight is to learn adaptively encoded event features via the state space models (SSMs), whose linear complexity and generalization on input frequency make them ideal for processing high temporal resolution events. Specifically, we propose a Path-selective Event Aggregation and Scan (PEAS) module to encode events into features with fixed dimensions by adaptively scanning and selecting aggregated event presentations. On top of it, we introduce a novel Multi-faceted Selection Guiding (MSG) loss to minimize the randomness and redundancy of the encoded features during the PEAS selection process. Our method outperforms prior methods on five public datasets and shows strong generalization across varying inference frequencies with less accuracy drop (ours -8.62% vs. -20.69% for the baseline). Overall, PASS exhibits strong long spatiotemporal modeling for a broader distribution of event length (1-10^9), precise temporal perception, and generalization for real-world

View on arXiv PDF

Similar