CVAug 27, 2021

SIGN: Spatial-information Incorporated Generative Network for Generalized Zero-shot Semantic Segmentation

arXiv:2108.12517v168 citations
Originality Incremental advance
AI Analysis

This work addresses pixel-level prediction challenges in semantic segmentation for computer vision applications, representing an incremental improvement.

The paper tackles zero-shot semantic segmentation by incorporating spatial information via Relative Positional Encoding and improving self-training with Annealed Self-Training, achieving enhanced performance on three benchmark datasets.

Unlike conventional zero-shot classification, zero-shot semantic segmentation predicts a class label at the pixel level instead of the image level. When solving zero-shot semantic segmentation problems, the need for pixel-level prediction with surrounding context motivates us to incorporate spatial information using positional encoding. We improve standard positional encoding by introducing the concept of Relative Positional Encoding, which integrates spatial information at the feature level and can handle arbitrary image sizes. Furthermore, while self-training is widely used in zero-shot semantic segmentation to generate pseudo-labels, we propose a new knowledge-distillation-inspired self-training strategy, namely Annealed Self-Training, which can automatically assign different importance to pseudo-labels to improve performance. We systematically study the proposed Relative Positional Encoding and Annealed Self-Training in a comprehensive experimental evaluation, and our empirical results confirm the effectiveness of our method on three benchmark datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes