E2VIDiff: Perceptual Events-to-Video Reconstruction using Diffusion Priors
This work addresses the challenge of generating realistic video from event cameras, which is important for applications like robotics and surveillance, but it is incremental as it applies diffusion models to an existing task.
The paper tackled the ill-posed problem of reconstructing video from event camera data, achieving a better trade-off between perceptual quality and distortion compared to previous methods.
Event cameras, mimicking the human retina, capture brightness changes with unparalleled temporal resolution and dynamic range. Integrating events into intensities poses a highly ill-posed challenge, marred by initial condition ambiguities. Traditional regression-based deep learning methods fall short in perceptual quality, offering deterministic and often unrealistic reconstructions. In this paper, we introduce diffusion models to events-to-video reconstruction, achieving colorful, realistic, and perceptually superior video generation from achromatic events. Powered by the image generation ability and knowledge of pretrained diffusion models, the proposed method can achieve a better trade-off between the perception and distortion of the reconstructed frame compared to previous solutions. Extensive experiments on benchmark datasets demonstrate that our approach can produce diverse, realistic frames with faithfulness to the given events.