CVOct 22, 2019

Towards Automatic Annotation for Semantic Segmentation in Drone Videos

arXiv:1910.10026v19 citations
Originality Incremental advance
AI Analysis

This addresses the problem of high labeling costs for drone video applications, enabling more efficient dataset creation for researchers and practitioners in aerial robotics and computer vision, though it is incremental as it builds on existing label propagation methods.

The authors tackled the lack of large annotated datasets for semantic segmentation in drone videos by introducing a new dataset with manual annotations and proposing SegProp, an iterative flow-based method for automatic label propagation, which resulted in over 50k annotated frames and a 16.8% mean F-measure boost in segmentation performance.

Semantic segmentation is a crucial task for robot navigation and safety. However, it requires huge amounts of pixelwise annotations to yield accurate results. While recent progress in computer vision algorithms has been heavily boosted by large ground-level datasets, the labeling time has hampered progress in low altitude UAV applications, mostly due to the difficulty imposed by large object scales and pose variations. Motivated by the lack of a large video aerial dataset, we introduce a new one, with high resolution (4K) images and manually-annotated dense labels every 50 frames. To help the video labeling process, we make an important step towards automatic annotation and propose SegProp, an iterative flow-based method with geometric constrains to propagate the semantic labels to frames that lack human annotations. This results in a dataset with more than 50k annotated frames - the largest of its kind, to the best of our knowledge. Our experiments show that SegProp surpasses current state-of-the-art label propagation methods by a significant margin. Furthermore, when training a semantic segmentation deep neural net using the automatically annotated frames, we obtain a compelling overall performance boost at test time of 16.8% mean F-measure over a baseline trained only with manually-labeled frames. Our Ruralscapes dataset, the label propagation code and a fast segmentation tool are available at our website: https://sites.google.com/site/aerialimageunderstanding/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes