DeepTemporalSeg: Temporally Consistent Semantic Segmentation of 3D LiDAR Scans
This work addresses the need for robust and consistent semantic understanding in autonomous robot navigation, representing an incremental advancement by integrating existing techniques for improved temporal consistency.
The paper tackles the problem of achieving temporally consistent semantic segmentation of 3D LiDAR scans for autonomous robots by proposing a deep convolutional neural network with dense blocks and depth separable convolutions, combined with a Bayes filter for recursive estimation, resulting in substantial improvements over state-of-the-art methods as demonstrated on the KITTI tracking benchmark.
Understanding the semantic characteristics of the environment is a key enabler for autonomous robot operation. In this paper, we propose a deep convolutional neural network (DCNN) for the semantic segmentation of a LiDAR scan into the classes car, pedestrian or bicyclist. This architecture is based on dense blocks and efficiently utilizes depth separable convolutions to limit the number of parameters while still maintaining state-of-the-art performance. To make the predictions from the DCNN temporally consistent, we propose a Bayes filter based method. This method uses the predictions from the neural network to recursively estimate the current semantic state of a point in a scan. This recursive estimation uses the knowledge gained from previous scans, thereby making the predictions temporally consistent and robust towards isolated erroneous predictions. We compare the performance of our proposed architecture with other state-of-the-art neural network architectures and report substantial improvement. For the proposed Bayes filter approach, we show results on various sequences in the KITTI tracking benchmark.