re-OBJ: Jointly Learning the Foreground and Background for Object Instance Re-identification
This addresses the problem of distinguishing objects with similar appearances in static environments like indoor scenes, offering an incremental improvement over existing methods.
The paper tackles object instance re-identification in rigid scenes by jointly learning foreground and background features to handle similar appearances, showing a 28.25% relative improvement in rank-1 accuracy over deepSort on the ScanNet dataset.
Conventional approaches to object instance re-identification rely on matching appearances of the target objects among a set of frames. However, learning appearances of the objects alone might fail when there are multiple objects with similar appearance or multiple instances of same object class present in the scene. This paper proposes that partial observations of the background can be utilized to aid in the object re-identification task for a rigid scene, especially a rigid environment with a lot of reoccurring identical models of objects. Using an extension to the Mask R-CNN architecture, we learn to encode the important and distinct information in the background jointly with the foreground relevant to rigid real-world scenarios such as an indoor environment where objects are static and the camera moves around the scene. We demonstrate the effectiveness of our joint visual feature in the re-identification of objects in the ScanNet dataset and show a relative improvement of around 28.25% in the rank-1 accuracy over the deepSort method.