CVAIMar 23, 2021

Dilated SpineNet for Semantic Segmentation

arXiv:2103.12270v13 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses semantic segmentation for computer vision applications, showing incremental improvements over existing methods.

The paper tackled semantic segmentation by proposing SpineNet-Seg, a network discovered through neural architecture search from DeepLabv3, which outperformed DeepLabv3/v3+ baselines in speed and accuracy, achieving 83.04% mIoU on Cityscapes and 85.56% mIoU on PASCAL VOC2012.

Scale-permuted networks have shown promising results on object bounding box detection and instance segmentation. Scale permutation and cross-scale fusion of features enable the network to capture multi-scale semantics while preserving spatial resolution. In this work, we evaluate this meta-architecture design on semantic segmentation - another vision task that benefits from high spatial resolution and multi-scale feature fusion at different network stages. By further leveraging dilated convolution operations, we propose SpineNet-Seg, a network discovered by NAS that is searched from the DeepLabv3 system. SpineNet-Seg is designed with a better scale-permuted network topology with customized dilation ratios per block on a semantic segmentation task. SpineNet-Seg models outperform the DeepLabv3/v3+ baselines at all model scales on multiple popular benchmarks in speed and accuracy. In particular, our SpineNet-S143+ model achieves the new state-of-the-art on the popular Cityscapes benchmark at 83.04% mIoU and attained strong performance on the PASCAL VOC2012 benchmark at 85.56% mIoU. SpineNet-Seg models also show promising results on a challenging Street View segmentation dataset. Code and checkpoints will be open-sourced.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes