Universal Embeddings for Spatio-Temporal Tagging of Self-Driving Logs
This addresses the problem of automated scene understanding for self-driving systems, though it appears incremental as it builds on existing embedding methods for tagging tasks.
The paper tackles spatio-temporal tagging of self-driving scenes from raw sensor data by learning a universal embedding, enabling efficient tagging of multiple attributes and faster learning of new ones with limited data, and demonstrates effectiveness on a new large-scale dataset SDVScenes with 15 attributes.
In this paper, we tackle the problem of spatio-temporal tagging of self-driving scenes from raw sensor data. Our approach learns a universal embedding for all tags, enabling efficient tagging of many attributes and faster learning of new attributes with limited data. Importantly, the embedding is spatio-temporally aware, allowing the model to naturally output spatio-temporal tag values. Values can then be pooled over arbitrary regions, in order to, for example, compute the pedestrian density in front of the SDV, or determine if a car is blocking another car at a 4-way intersection. We demonstrate the effectiveness of our approach on a new large scale self-driving dataset, SDVScenes, containing 15 attributes relating to vehicle and pedestrian density, the actions of each actor, the speed of each actor, interactions between actors, and the topology of the road map.