MOVE: Unsupervised Movable Object Segmentation and Detection
This addresses the problem of reducing annotation costs for object segmentation and detection, though it is incremental as it builds on existing self-supervised features and inpainting networks.
The paper tackles unsupervised object segmentation and detection by exploiting that foreground objects can be shifted to create realistic images, achieving state-of-the-art performance with a 7.2% average CorLoc improvement in single object discovery and a 53% relative AP improvement in class-agnostic detection.
We introduce MOVE, a novel method to segment objects without any form of supervision. MOVE exploits the fact that foreground objects can be shifted locally relative to their initial position and result in realistic (undistorted) new images. This property allows us to train a segmentation model on a dataset of images without annotation and to achieve state of the art (SotA) performance on several evaluation datasets for unsupervised salient object detection and segmentation. In unsupervised single object discovery, MOVE gives an average CorLoc improvement of 7.2% over the SotA, and in unsupervised class-agnostic object detection it gives a relative AP improvement of 53% on average. Our approach is built on top of self-supervised features (e.g. from DINO or MAE), an inpainting network (based on the Masked AutoEncoder) and adversarial training.