TagMe: GPS-Assisted Automatic Object Annotation in Videos
This addresses the need for low-cost, scalable annotation in computer vision, particularly for outdoor video streams, though it is incremental as it builds on existing motion matching techniques with GPS data.
The paper tackles the problem of expensive and time-consuming manual annotation for object detection in videos by introducing TagMe, a GPS-assisted automatic annotation system that matches object motion from GPS traces with pixel motion in videos to generate bounding boxes, achieving up to 110x cost reduction compared to human annotation.
Training high-accuracy object detection models requires large and diverse annotated datasets. However, creating these data-sets is time-consuming and expensive since it relies on human annotators. We design, implement, and evaluate TagMe, a new approach for automatic object annotation in videos that uses GPS data. When the GPS trace of an object is available, TagMe matches the object's motion from GPS trace and the pixels' motions in the video to find the pixels belonging to the object in the video and creates the bounding box annotations of the object. TagMe works using passive data collection and can continuously generate new object annotations from outdoor video streams without any human annotators. We evaluate TagMe on a dataset of 100 video clips. We show TagMe can produce high-quality object annotations in a fully-automatic and low-cost way. Compared with the traditional human-in-the-loop solution, TagMe can produce the same amount of annotations at a much lower cost, e.g., up to 110x.