A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions
It provides a comprehensive overview for researchers and practitioners in autonomous driving, but is incremental as it synthesizes existing trends without introducing new methods.
This survey examines the use of Vision Transformers in autonomous driving, highlighting their advantages over traditional methods like RNNs and CNNs in tasks such as object detection and scene recognition, but does not report specific numerical results as it is a review paper.
This survey explores the adaptation of visual transformer models in Autonomous Driving, a transition inspired by their success in Natural Language Processing. Surpassing traditional Recurrent Neural Networks in tasks like sequential image processing and outperforming Convolutional Neural Networks in global context capture, as evidenced in complex scene recognition, Transformers are gaining traction in computer vision. These capabilities are crucial in Autonomous Driving for real-time, dynamic visual scene processing. Our survey provides a comprehensive overview of Vision Transformer applications in Autonomous Driving, focusing on foundational concepts such as self-attention, multi-head attention, and encoder-decoder architecture. We cover applications in object detection, segmentation, pedestrian detection, lane detection, and more, comparing their architectural merits and limitations. The survey concludes with future research directions, highlighting the growing role of Vision Transformers in Autonomous Driving.