EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency
This work addresses the challenge of achieving accurate stereo disparity for robot vision in dynamic environments, representing an incremental improvement over existing methods.
The paper tackles the problem of stereo disparity estimation using event cameras by proposing EV-MGDispNet, which introduces modules to better exploit temporal information and reduce pixel shifts, resulting in improved performance with lower MAE and RMSE metrics in real-world driving scenarios.
Event cameras have the potential to revolutionize the field of robot vision, particularly in areas like stereo disparity estimation, owing to their high temporal resolution and high dynamic range. Many studies use deep learning for event camera stereo disparity estimation. However, these methods fail to fully exploit the temporal information in the event stream to acquire clear event representations. Additionally, there is room for further reduction in pixel shifts in the feature maps before constructing the cost volume. In this paper, we propose EV-MGDispNet, a novel event-based stereo disparity estimation method. Firstly, we propose an edge-aware aggregation (EAA) module, which fuses event frames and motion confidence maps to generate a novel clear event representation. Then, we propose a motion-guided attention (MGA) module, where motion confidence maps utilize deformable transformer encoders to enhance the feature map with more accurate edges. Finally, we also add a census left-right consistency loss function to enhance the left-right consistency of stereo event representation. Through conducting experiments within challenging real-world driving scenarios, we validate that our method outperforms currently known state-of-the-art methods in terms of mean absolute error (MAE) and root mean square error (RMSE) metrics.