V2CE: Video to Continuous Events Simulator
This addresses the dataset scarcity problem for DVS-based computer vision researchers by providing a high-quality simulation tool.
The paper tackles the problem of converting standard video (APS) data into event streams for Dynamic Vision Sensors (DVS), which lack sufficient labeled datasets, by proposing a novel method that significantly enhances event voxel quality and accurately recovers continuous timestamps to eliminate temporal layering issues, establishing it as state-of-the-art through rigorous quantified validation.
Dynamic Vision Sensor (DVS)-based solutions have recently garnered significant interest across various computer vision tasks, offering notable benefits in terms of dynamic range, temporal resolution, and inference speed. However, as a relatively nascent vision sensor compared to Active Pixel Sensor (APS) devices such as RGB cameras, DVS suffers from a dearth of ample labeled datasets. Prior efforts to convert APS data into events often grapple with issues such as a considerable domain shift from real events, the absence of quantified validation, and layering problems within the time axis. In this paper, we present a novel method for video-to-events stream conversion from multiple perspectives, considering the specific characteristics of DVS. A series of carefully designed losses helps enhance the quality of generated event voxels significantly. We also propose a novel local dynamic-aware timestamp inference strategy to accurately recover event timestamps from event voxels in a continuous fashion and eliminate the temporal layering problem. Results from rigorous validation through quantified metrics at all stages of the pipeline establish our method unquestionably as the current state-of-the-art (SOTA).