CVAIMMDec 10, 2024

EvRepSL: Event-Stream Representation via Self-Supervised Learning for Event-Based Vision

arXiv:2412.07080v117 citationsh-index: 7IEEE Transactions on Image Processing
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving event-stream representations for computer vision tasks using event cameras, offering a versatile solution that is incremental over existing methods.

The paper tackles the problem of noisy event-stream representations for event-based vision by introducing EvRepSL, a self-supervised learning method that converts event-streams into high-quality representations, achieving superior performance on classification and optical flow datasets across various event cameras.

Event-stream representation is the first step for many computer vision tasks using event cameras. It converts the asynchronous event-streams into a formatted structure so that conventional machine learning models can be applied easily. However, most of the state-of-the-art event-stream representations are manually designed and the quality of these representations cannot be guaranteed due to the noisy nature of event-streams. In this paper, we introduce a data-driven approach aiming at enhancing the quality of event-stream representations. Our approach commences with the introduction of a new event-stream representation based on spatial-temporal statistics, denoted as EvRep. Subsequently, we theoretically derive the intrinsic relationship between asynchronous event-streams and synchronous video frames. Building upon this theoretical relationship, we train a representation generator, RepGen, in a self-supervised learning manner accepting EvRep as input. Finally, the event-streams are converted to high-quality representations, termed as EvRepSL, by going through the learned RepGen (without the need of fine-tuning or retraining). Our methodology is rigorously validated through extensive evaluations on a variety of mainstream event-based classification and optical flow datasets (captured with various types of event cameras). The experimental results highlight not only our approach's superior performance over existing event-stream representations but also its versatility, being agnostic to different event cameras and tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes