Abbas Rajabifard

CVMar 16, 2023

Dual skip connections in U-Net, ResUnet and U-Net3+ for remote extraction of buildings

Bipul Neupane, Jagannath Aryal, Abbas Rajabifard

Urban buildings are extracted from high-resolution Earth observation (EO) images using semantic segmentation networks like U-Net and its successors. Each re-iteration aims to improve performance by employing a denser skip connection mechanism that harnesses multi-scale features for accurate object mapping. However, denser connections increase network parameters and do not necessarily contribute to precise segmentation. In this paper, we develop three dual skip connection mechanisms for three networks (U-Net, ResUnet, and U-Net3+) to selectively deepen the essential feature maps for improved performance. The three mechanisms are evaluated on feature maps of different scales, producing nine new network configurations. They are evaluated against their original vanilla configurations on four building footprint datasets of different spatial resolutions, including a multi-resolution (0.3+0.6+1.2m) dataset that we develop for complex urban environments. The evaluation revealed that densifying the large- and small-scale features in U-Net and U-Net3+ produce up to 0.905 F1, more than TransUnet (0.903) and Swin-Unet (0.882) in our new dataset with up to 19x fewer parameters. The results conclude that selectively densifying feature maps and skip connections enhances network performance without a substantial increase in parameters. The findings and the new dataset will contribute to the computer vision domain and urban planning decision processes.

CVNov 7, 2023

Supervised domain adaptation for building extraction from off-nadir aerial images

Bipul Neupane, Jagannath Aryal, Abbas Rajabifard

Building extraction $-$ needed for inventory management and planning of urban environment $-$ is affected by the misalignment between labels and off-nadir source imagery in training data. Teacher-Student learning of noise-tolerant convolutional neural networks (CNNs) is the existing solution, but the Student networks typically have lower accuracy and cannot surpass the Teacher's performance. This paper proposes a supervised domain adaptation (SDA) of encoder-decoder networks (EDNs) between noisy and clean datasets to tackle the problem. EDNs are configured with high-performing lightweight encoders such as EfficientNet, ResNeSt, and MobileViT. The proposed method is compared against the existing Teacher-Student learning methods like knowledge distillation (KD) and deep mutual learning (DML) with three newly developed datasets. The methods are evaluated for different urban buildings (low-rise, mid-rise, high-rise, and skyscrapers), where misalignment increases with the increase in building height and spatial resolution. For a robust experimental design, 43 lightweight CNNs, five optimisers, nine loss functions, and seven EDNs are benchmarked to obtain the best-performing EDN for SDA. The SDA of the best-performing EDN from our study significantly outperformed KD and DML with up to 0.943, 0.868, 0.912, and 0.697 F1 scores in the low-rise, mid-rise, high-rise, and skyscrapers respectively. The proposed method and the experimental findings will be beneficial in training robust CNNs for building extraction.

Abbas Rajabifard

2 Papers