CVOct 11, 2023

Impact of Label Types on Training SWIN Models with Overhead Imagery

Ryan Ford, Kenneth Hutchison, Nicholas Felts, Benjamin Cheng, Jesse Lew, Kyle Jackson

arXiv:2310.07572v1h-index: 9

Originality Synthesis-oriented

AI Analysis

This work addresses the high cost of generating labeled remote sensing data for deep learning practitioners, suggesting bounding boxes may be sufficient for many tasks.

This study investigated whether expensive segmentation labels provide performance benefits over cheaper bounding box labels when training SWIN models on overhead imagery. The researchers found that for classification tasks, models trained only on target pixels (from segmentation labels) showed no improvement, while for object detection, both label types produced equivalent performance.

Understanding the impact of data set design on model training and performance can help alleviate the costs associated with generating remote sensing and overhead labeled data. This work examined the impact of training shifted window transformers using bounding boxes and segmentation labels, where the latter are more expensive to produce. We examined classification tasks by comparing models trained with both target and backgrounds against models trained with only target pixels, extracted by segmentation labels. For object detection models, we compared performance using either label type when training. We found that the models trained on only target pixels do not show performance improvement for classification tasks, appearing to conflate background pixels in the evaluation set with target pixels. For object detection, we found that models trained with either label type showed equivalent performance across testing. We found that bounding boxes appeared to be sufficient for tasks that did not require more complex labels, such as object segmentation. Continuing work to determine consistency of this result across data types and model architectures could potentially result in substantial savings in generating remote sensing data sets for deep learning.

View on arXiv PDF

Similar