CV LGMar 18, 2022

ESS: Learning Event-based Semantic Segmentation from Still Images

Zhaoning Sun, Nico Messikommer, Daniel Gehrig, Davide Scaramuzza

arXiv:2203.10016v226.7128 citationsh-index: 115Has Code

Originality Highly original

AI Analysis

This work addresses the lack of labeled datasets for event-based semantic segmentation, enabling applications in high dynamic range and high-speed conditions where traditional image-based methods fail.

The paper tackles semantic segmentation for event cameras by developing ESS, an unsupervised domain adaptation method that transfers knowledge from labeled still images to unlabeled events without requiring video data or motion hallucination. The approach outperforms existing UDA methods using only image labels and, when combined with event labels, surpasses supervised state-of-the-art on datasets like DDD17 and DSEC-Semantic.

Retrieving accurate semantic information in challenging high dynamic range (HDR) and high-speed conditions remains an open challenge for image-based algorithms due to severe image degradations. Event cameras promise to address these challenges since they feature a much higher dynamic range and are resilient to motion blur. Nonetheless, semantic segmentation with event cameras is still in its infancy which is chiefly due to the lack of high-quality, labeled datasets. In this work, we introduce ESS (Event-based Semantic Segmentation), which tackles this problem by directly transferring the semantic segmentation task from existing labeled image datasets to unlabeled events via unsupervised domain adaptation (UDA). Compared to existing UDA methods, our approach aligns recurrent, motion-invariant event embeddings with image embeddings. For this reason, our method neither requires video data nor per-pixel alignment between images and events and, crucially, does not need to hallucinate motion from still images. Additionally, we introduce DSEC-Semantic, the first large-scale event-based dataset with fine-grained labels. We show that using image labels alone, ESS outperforms existing UDA approaches, and when combined with event labels, it even outperforms state-of-the-art supervised approaches on both DDD17 and DSEC-Semantic. Finally, ESS is general-purpose, which unlocks the vast amount of existing labeled image datasets and paves the way for new and exciting research directions in new fields previously inaccessible for event cameras.

View on arXiv PDF Code

Similar