Evaluating Context for Deep Object Detectors
This work provides insights for practitioners on selecting object detectors based on application context, though it is incremental as it builds on existing detector categories.
The paper systematically evaluates how different deep object detector categories (RCNN, two-stage, single-stage) use scene context for recognition, finding that single-stage and two-stage detectors leverage context due to their large receptive fields, with performance assessed on controlled datasets and MS COCO.
Which object detector is suitable for your context sensitive task? Deep object detectors exploit scene context for recognition differently. In this paper, we group object detectors into 3 categories in terms of context use: no context by cropping the input (RCNN), partial context by cropping the featuremap (two-stage methods) and full context without any cropping (single-stage methods). We systematically evaluate the effect of context for each deep detector category. We create a fully controlled dataset for varying context and investigate the context for deep detectors. We also evaluate gradually removing the background context and the foreground object on MS COCO. We demonstrate that single-stage and two-stage object detectors can and will use the context by virtue of their large receptive field. Thus, choosing the best object detector may depend on the application context.