Label-guided Attention Distillation for Lane Segmentation
This addresses a specific bottleneck in lane segmentation for autonomous driving or traffic analysis, but it is incremental as it builds on existing distillation and attention methods.
The paper tackles the problem of capturing long-range contexts like lane markers in segmentation by proposing Label-guided Attention Distillation (LGAD), which uses a teacher network trained on label maps to guide a student network's attention, resulting in significantly better learning without increasing inference time.
Contemporary segmentation methods are usually based on deep fully convolutional networks (FCNs). However, the layer-by-layer convolutions with a growing receptive field is not good at capturing long-range contexts such as lane markers in the scene. In this paper, we address this issue by designing a distillation method that exploits label structure when training segmentation network. The intuition is that the ground-truth lane annotations themselves exhibit internal structure. We broadcast the structure hints throughout a teacher network, i.e., we train a teacher network that consumes a lane label map as input and attempts to replicate it as output. Then, the attention maps of the teacher network are adopted as supervisors of the student segmentation network. The teacher network, with label structure information embedded, knows distinctly where the convolution layers should pay visual attention into. The proposed method is named as Label-guided Attention Distillation (LGAD). It turns out that the student network learns significantly better with LGAD than when learning alone. As the teacher network is deprecated after training, our method do not increase the inference time. Note that LGAD can be easily incorporated in any lane segmentation network.