DELTA: Dense Depth from Events and LiDAR using Transformer's Attention
This addresses the challenge of depth estimation in robotics and autonomous systems by combining complementary sensors, though it is an incremental improvement in multimodal fusion.
The paper tackled the problem of estimating dense depth maps by fusing event camera and LiDAR data, achieving a new state of the art with errors reduced up to four times for close ranges compared to previous methods.
Event cameras and LiDARs provide complementary yet distinct data: respectively, asynchronous detections of changes in lighting versus sparse but accurate depth information at a fixed rate. To this day, few works have explored the combination of these two modalities. In this article, we propose a novel neural-network-based method for fusing event and LiDAR data in order to estimate dense depth maps. Our architecture, DELTA, exploits the concepts of self- and cross-attention to model the spatial and temporal relations within and between the event and LiDAR data. Following a thorough evaluation, we demonstrate that DELTA sets a new state of the art in the event-based depth estimation problem, and that it is able to reduce the errors up to four times for close ranges compared to the previous SOTA.