CVMay 11

Neuromorphic Monocular Depth Estimation with Uncertainty Modeling

arXiv:2605.106757.7
AI Analysis

For researchers in event-based vision, this work provides a systematic comparison of uncertainty methods for depth estimation, though the gains are incremental over existing representations.

This paper integrates uncertainty estimation into monocular depth estimation from event streams, comparing six event representations and three uncertainty frameworks. The best results are achieved with 10-bin log-normal and 5-bin evidential learning, showing that uncertainty can indicate reliable depth pixels.

Event cameras offer distinct advantages over conventional frame-based sensors, including microsecond-level temporal resolution, high dynamic range, and low bandwidth. In this paper, we predict per-pixel depth distributions from monocular event streams using deep neural networks. We estimate uncertainty using Gaussian, log-normal, and evidential learning frameworks. We compare six event representations: spatio-temporal voxel grids with 1, 5, 10, and 20 temporal bins, the Compact Spatio-Temporal Representation (CSTR), and Time-Ordered Recent Event (TORE) volumes. Our U-Net-based models are trained on synthetic data and then fine-tuned on real sequences. We evaluate performance using absolute relative error, root mean squared error, and the area under the sparsification error. Quantitative results show that the representations perform similarly, while 10 bin log-normal and 5 bin evidential learning perform best across metrics. Our experiments demonstrate that uncertainty estimation can be successfully integrated into event-based monocular depth estimation, and be used to indicate pixels with reliable depth.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes