CVAug 10, 2023

Category Feature Transformer for Semantic Segmentation

arXiv:2308.05581v15 citationsh-index: 54
Originality Incremental advance
AI Analysis

This work addresses feature aggregation for semantic segmentation, offering a novel method that improves performance and efficiency, though it is incremental as it builds on existing multi-head attention and feature pyramid structures.

The paper tackles the problem of feature aggregation in semantic segmentation by proposing the Category Feature Transformer (CFT), which uses multi-head attention to learn and broadcast category embeddings, achieving a compelling 55.1% mIoU on the ADE20K dataset with reduced parameters and computations.

Aggregation of multi-stage features has been revealed to play a significant role in semantic segmentation. Unlike previous methods employing point-wise summation or concatenation for feature aggregation, this study proposes the Category Feature Transformer (CFT) that explores the flow of category embedding and transformation among multi-stage features through the prevalent multi-head attention mechanism. CFT learns unified feature embeddings for individual semantic categories from high-level features during each aggregation process and dynamically broadcasts them to high-resolution features. Integrating the proposed CFT into a typical feature pyramid structure exhibits superior performance over a broad range of backbone networks. We conduct extensive experiments on popular semantic segmentation benchmarks. Specifically, the proposed CFT obtains a compelling 55.1% mIoU with greatly reduced model parameters and computations on the challenging ADE20K dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes