CVAIApr 26, 2023

Exploiting CNNs for Semantic Segmentation with Pascal VOC

arXiv:2304.13216v2h-index: 6
Originality Synthesis-oriented
AI Analysis

This is an incremental study for researchers in computer vision, focusing on optimizing semantic segmentation methods for a specific dataset.

The paper tackles semantic segmentation on the Pascal VOC dataset by evaluating a baseline Fully Convolutional Network (FCN) and testing improvements like cosine annealing, data augmentation, and class imbalance weights, along with architectures including a proposed Advanced FCN, transfer learning with ResNet, and U-Net, with transfer learning achieving the best performance at 71.33% pixel accuracy and 0.0926 mean IoU.

In this paper, we present a comprehensive study on semantic segmentation with the Pascal VOC dataset. Here, we have to label each pixel with a class which in turn segments the entire image based on the objects/entities present. To tackle this, we firstly use a Fully Convolution Network (FCN) baseline which gave 71.31% pixel accuracy and 0.0527 mean IoU. We analyze its performance and working and subsequently address the issues in the baseline with three improvements: a) cosine annealing learning rate scheduler(pixel accuracy: 72.86%, IoU: 0.0529), b) data augmentation(pixel accuracy: 69.88%, IoU: 0.0585) c) class imbalance weights(pixel accuracy: 68.98%, IoU: 0.0596). Apart from these changes in training pipeline, we also explore three different architectures: a) Our proposed model -- Advanced FCN (pixel accuracy: 67.20%, IoU: 0.0602) b) Transfer Learning with ResNet (Best performance) (pixel accuracy: 71.33%, IoU: 0.0926 ) c) U-Net(pixel accuracy: 72.15%, IoU: 0.0649). We observe that the improvements help in greatly improving the performance, as reflected both, in metrics and segmentation maps. Interestingly, we observe that among the improvements, dataset augmentation has the greatest contribution. Also, note that transfer learning model performs the best on the pascal dataset. We analyse the performance of these using loss, accuracy and IoU plots along with segmentation maps, which help us draw valuable insights about the working of the models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes