RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation
This work addresses the need for more integrated and accurate computer vision models for tasks like autonomous driving or image analysis, though it is incremental in building on existing detection and segmentation methods.
The paper tackles the problem of jointly performing object detection and instance segmentation by proposing RDSNet, a two-stream deep architecture that reciprocally fuses object-level and pixel-level information, achieving improved performance on the COCO dataset with specific gains in accuracy and efficiency.
Object detection and instance segmentation are two fundamental computer vision tasks. They are closely correlated but their relations have not yet been fully explored in most previous work. This paper presents RDSNet, a novel deep architecture for reciprocal object detection and instance segmentation. To reciprocate these two tasks, we design a two-stream structure to learn features on both the object level (i.e., bounding boxes) and the pixel level (i.e., instance masks) jointly. Within this structure, information from the two streams is fused alternately, namely information on the object level introduces the awareness of instance and translation variance to the pixel level, and information on the pixel level refines the localization accuracy of objects on the object level in return. Specifically, a correlation module and a cropping module are proposed to yield instance masks, as well as a mask based boundary refinement module for more accurate bounding boxes. Extensive experimental analyses and comparisons on the COCO dataset demonstrate the effectiveness and efficiency of RDSNet. The source code is available at https://github.com/wangsr126/RDSNet.