Analysing object detectors from the perspective of co-occurring object categories
This work addresses the problem of understanding and potentially transferring contextual knowledge in object detection for computer vision researchers, but it is incremental as it analyzes existing methods without introducing new techniques.
The study evaluated Faster R-CNN and YOLO object detectors on a masked MS COCO dataset to measure their reliance on contextual information at the object category level, finding that while they generally do not build strong dependencies, when they do, it is in a similar way, suggesting contextual dependence is an independent property.
The accuracy of state-of-the-art Faster R-CNN and YOLO object detectors are evaluated and compared on a special masked MS COCO dataset to measure how much their predictions rely on contextual information encoded at object category level. Category level representation of context is motivated by the fact that it could be an adequate way to transfer knowledge between visual and non-visual domains. According to our measurements, current detectors usually do not build strong dependency on contextual information at category level, however, when they does, they does it in a similar way, suggesting that contextual dependence of object categories is an independent property that is relevant to be transferred.