CVNov 20, 2023

Event Camera Data Dense Pre-training

arXiv:2311.11533v214.121 citationsh-index: 15

Originality Incremental advance

AI Analysis

This work addresses the challenge of pre-training neural networks for dense prediction tasks using event camera data, which is incremental as it adapts existing self-supervised techniques to a specific domain with sparsity issues.

The paper tackles the problem of poor performance when transferring dense RGB pre-training to event camera data due to spatial sparsity, by introducing a self-supervised learning framework that encodes event images into patch features, mines contextual similarities, and groups them to learn discriminative features, resulting in superior transfer learning performance on downstream dense prediction tasks compared to state-of-the-art methods.

This paper introduces a self-supervised learning framework designed for pre-training neural networks tailored to dense prediction tasks using event camera data. Our approach utilizes solely event data for training. Transferring achievements from dense RGB pre-training directly to event camera data yields subpar performance. This is attributed to the spatial sparsity inherent in an event image (converted from event data), where many pixels do not contain information. To mitigate this sparsity issue, we encode an event image into event patch features, automatically mine contextual similarity relationships among patches, group the patch features into distinctive contexts, and enforce context-to-context similarities to learn discriminative event features. For training our framework, we curate a synthetic event camera dataset featuring diverse scene and motion patterns. Transfer learning performance on downstream dense prediction tasks illustrates the superiority of our method over state-of-the-art approaches.

View on arXiv PDF

Similar