Inverting and Understanding Object Detectors
This addresses the problem of understanding and improving object detectors for computer vision practitioners, though it is incremental as it builds on existing visualization techniques.
The paper tackled the lack of interpretability in object detectors by proposing an optimization-based layout inversion method to generate synthetic images that detectors recognize as containing desired object configurations, revealing properties such as reliance on different features for classification and regression and learning of canonical object co-occurrence motifs.
As a core problem in computer vision, the performance of object detection has improved drastically in the past few years. Despite their impressive performance, object detectors suffer from a lack of interpretability. Visualization techniques have been developed and widely applied to introspect the decisions made by other kinds of deep learning models; however, visualizing object detectors has been underexplored. In this paper, we propose using inversion as a primary tool to understand modern object detectors and develop an optimization-based approach to layout inversion, allowing us to generate synthetic images recognized by trained detectors as containing a desired configuration of objects. We reveal intriguing properties of detectors by applying our layout inversion technique to a variety of modern object detectors, and further investigate them via validation experiments: they rely on qualitatively different features for classification and regression; they learn canonical motifs of commonly co-occurring objects; they use diff erent visual cues to recognize objects of varying sizes. We hope our insights can help practitioners improve object detectors.