Learning a Layout Transfer Network for Context Aware Object Detection
This work addresses object detection challenges in traffic surveillance and autonomous driving by integrating scene layout estimation, though it appears incremental as it builds on Faster RCNN.
The paper tackles object detection by retrieving and refining scene layouts to improve context awareness, resulting in consistent performance gains on traffic surveillance and autonomous driving datasets.
We present a context aware object detection method based on a retrieve-and-transform scene layout model. Given an input image, our approach first retrieves a coarse scene layout from a codebook of typical layout templates. In order to handle large layout variations, we use a variant of the spatial transformer network to transform and refine the retrieved layout, resulting in a set of interpretable and semantically meaningful feature maps of object locations and scales. The above steps are implemented as a Layout Transfer Network which we integrate into Faster RCNN to allow for joint reasoning of object detection and scene layout estimation. Extensive experiments on three public datasets verified that our approach provides consistent performance improvements to the state-of-the-art object detection baselines on a variety of challenging tasks in the traffic surveillance and the autonomous driving domains.