Analyzing General-Purpose Deep-Learning Detection and Segmentation Models with Images from a Lidar as a Camera Sensor
This work addresses the challenge of robotic perception in adverse environmental conditions by leveraging mature vision-based DL models for lidar-derived images, though it is incremental as it applies existing methods to a new type of data.
The paper tackles the problem of applying general-purpose deep learning detection and segmentation models to low-resolution 360-degree images derived from lidar sensors, encoding depth, reflectivity, or near-infrared data, and shows that with preprocessing, these models can process such images effectively, enabling usage in conditions where vision sensors are limited.
Over the last decade, robotic perception algorithms have significantly benefited from the rapid advances in deep learning (DL). Indeed, a significant amount of the autonomy stack of different commercial and research platforms relies on DL for situational awareness, especially vision sensors. This work explores the potential of general-purpose DL perception algorithms, specifically detection and segmentation neural networks, for processing image-like outputs of advanced lidar sensors. Rather than processing the three-dimensional point cloud data, this is, to the best of our knowledge, the first work to focus on low-resolution images with 360\textdegree field of view obtained with lidar sensors by encoding either depth, reflectivity, or near-infrared light in the image pixels. We show that with adequate preprocessing, general-purpose DL models can process these images, opening the door to their usage in environmental conditions where vision sensors present inherent limitations. We provide both a qualitative and quantitative analysis of the performance of a variety of neural network architectures. We believe that using DL models built for visual cameras offers significant advantages due to the much wider availability and maturity compared to point cloud-based perception.