CVAug 25, 2022

Refine and Represent: Region-to-Object Representation Learning

Akash Gokul, Konstantinos Kallidromitis, Shufan Li, Yusuke Kato, Kazuki Kozuka, Trevor Darrell, Colorado J Reed

arXiv:2208.11821v25.76 citationsh-index: 156Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of improving dense prediction tasks like segmentation for computer vision researchers, though it appears incremental as it builds on existing region-based and object-centric methods.

The paper tackles the problem of unifying region-based and object-centric pretraining in self-supervised learning by introducing Region-to-Object Representation Learning (R2O), which refines regions into object masks and learns joint representations, resulting in state-of-the-art performance with improvements like +0.7 mIOU on PASCAL VOC and +2.9 mIoU on Caltech-UCSD Birds.

Recent works in self-supervised learning have demonstrated strong performance on scene-level dense prediction tasks by pretraining with object-centric or region-based correspondence objectives. In this paper, we present Region-to-Object Representation Learning (R2O) which unifies region-based and object-centric pretraining. R2O operates by training an encoder to dynamically refine region-based segments into object-centric masks and then jointly learns representations of the contents within the mask. R2O uses a "region refinement module" to group small image regions, generated using a region-level prior, into larger regions which tend to correspond to objects by clustering region-level features. As pretraining progresses, R2O follows a region-to-object curriculum which encourages learning region-level features early on and gradually progresses to train object-centric representations. Representations learned using R2O lead to state-of-the art performance in semantic segmentation for PASCAL VOC (+0.7 mIOU) and Cityscapes (+0.4 mIOU) and instance segmentation on MS COCO (+0.3 mask AP). Further, after pretraining on ImageNet, R2O pretrained models are able to surpass existing state-of-the-art in unsupervised object segmentation on the Caltech-UCSD Birds 200-2011 dataset (+2.9 mIoU) without any further training. We provide the code/models from this work at https://github.com/KKallidromitis/r2o.

View on arXiv PDF Code

Similar