Weidong Hao

h-index10
2papers

2 Papers

CVFeb 4
HoloEv-Net: Efficient Event-based Action Recognition via Holographic Spatial Embedding and Global Spectral Gating

Weidong Hao

Event-based Action Recognition (EAR) has attracted significant attention due to the high temporal resolution and high dynamic range of event cameras. However, existing methods typically suffer from (i) the computational redundancy of dense voxel representations, (ii) structural redundancy inherent in multi-branch architectures, and (iii) the under-utilization of spectral information in capturing global motion patterns. To address these challenges, we propose an efficient EAR framework named HoloEv-Net. First, to simultaneously tackle representation and structural redundancies, we introduce a Compact Holographic Spatiotemporal Representation (CHSR). Departing from computationally expensive voxel grids, CHSR implicitly embeds horizontal spatial cues into the Time-Height (T-H) view, effectively preserving 3D spatiotemporal contexts within a 2D representation. Second, to exploit the neglected spectral cues, we design a Global Spectral Gating (GSG) module. By leveraging the Fast Fourier Transform (FFT) for global token mixing in the frequency domain, GSG enhances the representation capability with negligible parameter overhead. Extensive experiments demonstrate the scalability and effectiveness of our framework. Specifically, HoloEv-Net-Base achieves state-of-the-art performance on THU-EACT-50-CHL, HARDVS and DailyDVS-200, outperforming existing methods by 10.29%, 1.71% and 6.25%, respectively. Furthermore, our lightweight variant, HoloEv-Net-Small, delivers highly competitive accuracy while offering extreme efficiency, reducing parameters by 5.4 times, FLOPs by 300times, and latency by 2.4times compared to heavy baselines, demonstrating its potential for edge deployment.

CVSep 4, 2025
EGTM: Event-guided Efficient Turbulence Mitigation

Huanan Li, Rui Fan, Juntao Guan et al.

Turbulence mitigation (TM) aims to remove the stochastic distortions and blurs introduced by atmospheric turbulence into frame cameras. Existing state-of-the-art deep-learning TM methods extract turbulence cues from multiple degraded frames to find the so-called "lucky'', not distorted patch, for "lucky fusion''. However, it requires high-capacity network to learn from coarse-grained turbulence dynamics between synchronous frames with limited frame-rate, thus fall short in computational and storage efficiency. Event cameras, with microsecond-level temporal resolution, have the potential to fundamentally address this bottleneck with efficient sparse and asynchronous imaging mechanism. In light of this, we (i) present the fundamental \textbf{``event-lucky insight''} to reveal the correlation between turbulence distortions and inverse spatiotemporal distribution of event streams. Then, build upon this insight, we (ii) propose a novel EGTM framework that extracts pixel-level reliable turbulence-free guidance from the explicit but noisy turbulent events for temporal lucky fusion. Moreover, we (iii) build the first turbulence data acquisition system to contribute the first real-world event-driven TM dataset. Extensive experimental results demonstrate that our approach significantly surpass the existing SOTA TM method by 710 times, 214 times and 224 times in model size, inference latency and model complexity respectively, while achieving the state-of-the-art in restoration quality (+0.94 PSNR and +0.08 SSIM) on our real-world EGTM dataset. This demonstrating the great efficiency merit of introducing event modality into TM task. Demo code and data have been uploaded in supplementary material and will be released once accepted.