CVAug 7, 2025

Revealing Latent Information: A Physics-inspired Self-supervised Pre-training Framework for Noisy and Sparse Events

Lin Zhu, Ruonan Liu, Xiao Wang, Lizhi Wang, Hua Huang

arXiv:2508.05507v11 citationsh-index: 6Has CodeMM

Originality Incremental advance

AI Analysis

This work addresses a domain-specific problem for event-based vision systems, offering a robust pre-training solution to improve feature extraction from challenging event data.

The paper tackles the challenge of extracting effective features from sparse and noisy event camera data by proposing a self-supervised pre-training framework that reveals latent information like edges and textures. The framework consistently outperforms state-of-the-art methods on downstream tasks such as object recognition, semantic segmentation, and optical flow estimation.

Event camera, a novel neuromorphic vision sensor, records data with high temporal resolution and wide dynamic range, offering new possibilities for accurate visual representation in challenging scenarios. However, event data is inherently sparse and noisy, mainly reflecting brightness changes, which complicates effective feature extraction. To address this, we propose a self-supervised pre-training framework to fully reveal latent information in event data, including edge information and texture cues. Our framework consists of three stages: Difference-guided Masked Modeling, inspired by the event physical sampling process, reconstructs temporal intensity difference maps to extract enhanced information from raw event data. Backbone-fixed Feature Transition contrasts event and image features without updating the backbone to preserve representations learned from masked modeling and stabilizing their effect on contrastive learning. Focus-aimed Contrastive Learning updates the entire model to improve semantic discrimination by focusing on high-value regions. Extensive experiments show our framework is robust and consistently outperforms state-of-the-art methods on various downstream tasks, including object recognition, semantic segmentation, and optical flow estimation. The code and dataset are available at https://github.com/BIT-Vision/EventPretrain.

View on arXiv PDF Code

Similar