Automated Data Curation Using GPS & NLP to Generate Instruction-Action Pairs for Autonomous Vehicle Vision-Language Navigation Datasets
This work addresses the need for efficient data curation in vision-language navigation for autonomous vehicles, though it appears incremental as it builds on existing GPS and NLP technologies.
The paper tackles the problem of costly manual annotation for instruction-action data pairs in autonomous vehicles by proposing an automated system using GPS and NLP to generate these pairs, demonstrating a prototype that collects and categorizes voice instructions from GPS applications to form vision-language-action triads.
Instruction-Action (IA) data pairs are valuable for training robotic systems, especially autonomous vehicles (AVs), but having humans manually annotate this data is costly and time-inefficient. This paper explores the potential of using mobile application Global Positioning System (GPS) references and Natural Language Processing (NLP) to automatically generate large volumes of IA commands and responses without having a human generate or retroactively tag the data. In our pilot data collection, by driving to various destinations and collecting voice instructions from GPS applications, we demonstrate a means to collect and categorize the diverse sets of instructions, further accompanied by video data to form complete vision-language-action triads. We provide details on our completely automated data collection prototype system, ADVLAT-Engine. We characterize collected GPS voice instructions into eight different classifications, highlighting the breadth of commands and referentialities available for curation from freely available mobile applications. Through research and exploration into the automation of IA data pairs using GPS references, the potential to increase the speed and volume at which high-quality IA datasets are created, while minimizing cost, can pave the way for robust vision-language-action (VLA) models to serve tasks in vision-language navigation (VLN) and human-interactive autonomous systems.