CVJul 19, 2018

Modeling Visual Context is Key to Augmenting Object Detection Datasets

arXiv:1807.07428v1276 citations
AI Analysis

This work addresses the challenge of improving object detection performance with limited labeled data, which is an incremental advancement in data augmentation techniques for computer vision.

The paper tackled the problem of data augmentation for object detection by leveraging segmentation annotations to add object instances, showing that modeling visual context is crucial to avoid performance degradation and achieving significant mean average precision improvements on the VOC'12 benchmark with few labeled examples.

Performing data augmentation for learning deep neural networks is well known to be important for training visual recognition systems. By artificially increasing the number of training examples, it helps reducing overfitting and improves generalization. For object detection, classical approaches for data augmentation consist of generating images obtained by basic geometrical transformations and color changes of original training images. In this work, we go one step further and leverage segmentation annotations to increase the number of object instances present on training data. For this approach to be successful, we show that modeling appropriately the visual context surrounding objects is crucial to place them in the right environment. Otherwise, we show that the previous strategy actually hurts. With our context model, we achieve significant mean average precision improvements when few labeled examples are available on the VOC'12 benchmark.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes