Non-local RoI for Cross-Object Perception
This work addresses a limitation in computer vision for object detection and segmentation by enhancing region-based methods with pairwise relationships, though it is incremental as it builds on existing R-CNN frameworks.
The paper tackled the problem of region-based perception by proposing a non-local region of interest (NL-RoI) module that encodes region proposals using both intrinsic features and extrinsic correlations to others, improving object detection and instance segmentation performance in Faster/Mask R-CNN architectures.
We present a generic and flexible module that encodes region proposals by both their intrinsic features and the extrinsic correlations to the others. The proposed non-local region of interest (NL-RoI) can be seamlessly adapted into different generalized R-CNN architectures to better address various perception tasks. Observe that existing techniques from R-CNN treat RoIs independently and perform the prediction solely based on image features within each region proposal. However, the pairwise relationships between proposals could further provide useful information for detection and segmentation. NL-RoI is thus formulated to enrich each RoI representation with the information from all other RoIs, and yield a simple, low-cost, yet effective module for region-based convolutional networks. Our experimental results show that NL-RoI can improve the performance of Faster/Mask R-CNN for object detection and instance segmentation.