Synthetic Datasets for Autonomous Driving: A Survey
This is an incremental survey that synthesizes existing research on synthetic datasets to aid researchers in autonomous driving by providing a comprehensive overview and potential solutions for data scarcity issues.
This survey addresses the challenge of obtaining high-quality data for autonomous driving by reviewing synthetic dataset generation methods as a complement to real-world data, highlighting their role in improving algorithm performance and testing for trustworthiness and safety.
Autonomous driving techniques have been flourishing in recent years while thirsting for huge amounts of high-quality data. However, it is difficult for real-world datasets to keep up with the pace of changing requirements due to their expensive and time-consuming experimental and labeling costs. Therefore, more and more researchers are turning to synthetic datasets to easily generate rich and changeable data as an effective complement to the real world and to improve the performance of algorithms. In this paper, we summarize the evolution of synthetic dataset generation methods and review the work to date in synthetic datasets related to single and multi-task categories for to autonomous driving study. We also discuss the role that synthetic dataset plays the evaluation, gap test, and positive effect in autonomous driving related algorithm testing, especially on trustworthiness and safety aspects. Finally, we discuss general trends and possible development directions. To the best of our knowledge, this is the first survey focusing on the application of synthetic datasets in autonomous driving. This survey also raises awareness of the problems of real-world deployment of autonomous driving technology and provides researchers with a possible solution.