CVSep 24, 2019

Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation

Yuhui Yuan, Xiaokang Chen, Xilin Chen, Jingdong Wang

arXiv:1909.11065v639.51695 citationsHas Code

Originality Highly original

AI Analysis

This addresses the problem of context aggregation in semantic segmentation for computer vision applications, offering a novel method with strong empirical gains.

The paper tackles semantic segmentation by proposing object-contextual representations, which characterize pixels using object class information, achieving competitive performance on benchmarks like Cityscapes and ADE20K, including a first-place result on the Cityscapes leaderboard.

In this paper, we address the semantic segmentation problem with a focus on the context aggregation strategy. Our motivation is that the label of a pixel is the category of the object that the pixel belongs to. We present a simple yet effective approach, object-contextual representations, characterizing a pixel by exploiting the representation of the corresponding object class. First, we learn object regions under the supervision of ground-truth segmentation. Second, we compute the object region representation by aggregating the representations of the pixels lying in the object region. Last, % the representation similarity we compute the relation between each pixel and each object region and augment the representation of each pixel with the object-contextual representation which is a weighted aggregation of all the object region representations according to their relations with the pixel. We empirically demonstrate that the proposed approach achieves competitive performance on various challenging semantic segmentation benchmarks: Cityscapes, ADE20K, LIP, PASCAL-Context, and COCO-Stuff. Cityscapes, ADE20K, LIP, PASCAL-Context, and COCO-Stuff. Our submission "HRNet + OCR + SegFix" achieves 1-st place on the Cityscapes leaderboard by the time of submission. Code is available at: https://git.io/openseg and https://git.io/HRNet.OCR. We rephrase the object-contextual representation scheme using the Transformer encoder-decoder framework. The details are presented in~Section3.3.

View on arXiv PDF Code

Similar