CVJun 15, 2022Code
S$^2$-FPN: Scale-ware Strip Attention Guided Feature Pyramid Network for Real-time Semantic SegmentationMohammed A. M. Elhassan, Chenhui Yang, Chenxi Huang et al.
Modern high-performance semantic segmentation methods employ a heavy backbone and dilated convolution to extract the relevant feature. Although extracting features with both contextual and semantic information is critical for the segmentation tasks, it brings a memory footprint and high computation cost for real-time applications. This paper presents a new model to achieve a trade-off between accuracy/speed for real-time road scene semantic segmentation. Specifically, we proposed a lightweight model named Scale-aware Strip Attention Guided Feature Pyramid Network (S$^2$-FPN). Our network consists of three main modules: Attention Pyramid Fusion (APF) module, Scale-aware Strip Attention Module (SSAM), and Global Feature Upsample (GFU) module. APF adopts an attention mechanisms to learn discriminative multi-scale features and help close the semantic gap between different levels. APF uses the scale-aware attention to encode global context with vertical stripping operation and models the long-range dependencies, which helps relate pixels with similar semantic label. In addition, APF employs channel-wise reweighting block (CRB) to emphasize the channel features. Finally, the decoder of S$^2$-FPN then adopts GFU, which is used to fuse features from APF and the encoder. Extensive experiments have been conducted on two challenging semantic segmentation benchmarks, which demonstrate that our approach achieves better accuracy/speed trade-off with different model settings. The proposed models have achieved a results of 76.2\%mIoU/87.3FPS, 77.4\%mIoU/67FPS, and 77.8\%mIoU/30.5FPS on Cityscapes dataset, and 69.6\%mIoU,71.0\% mIoU, and 74.2\% mIoU on Camvid dataset. The code for this work will be made available at \url{https://github.com/mohamedac29/S2-FPN
CVOct 23, 2023
P2AT: Pyramid Pooling Axial Transformer for Real-time Semantic SegmentationMohammed A. M. Elhassan, Changjun Zhou, Amina Benabid et al.
Recently, Transformer-based models have achieved promising results in various vision tasks, due to their ability to model long-range dependencies. However, transformers are computationally expensive, which limits their applications in real-time tasks such as autonomous driving. In addition, an efficient local and global feature selection and fusion are vital for accurate dense prediction, especially driving scene understanding tasks. In this paper, we propose a real-time semantic segmentation architecture named Pyramid Pooling Axial Transformer (P2AT). The proposed P2AT takes a coarse feature from the CNN encoder to produce scale-aware contextual features, which are then combined with the multi-level feature aggregation scheme to produce enhanced contextual features. Specifically, we introduce a pyramid pooling axial transformer to capture intricate spatial and channel dependencies, leading to improved performance on semantic segmentation. Then, we design a Bidirectional Fusion module (BiF) to combine semantic information at different levels. Meanwhile, a Global Context Enhancer is introduced to compensate for the inadequacy of concatenating different semantic levels. Finally, a decoder block is proposed to help maintain a larger receptive field. We evaluate P2AT variants on three challenging scene-understanding datasets. In particular, our P2AT variants achieve state-of-art results on the Camvid dataset 80.5%, 81.0%, 81.1% for P2AT-S, P2ATM, and P2AT-L, respectively. Furthermore, our experiment on Cityscapes and Pascal VOC 2012 have demonstrated the efficiency of the proposed architecture, with results showing that P2AT-M, achieves 78.7% on Cityscapes. The source code will be available at
CVApr 4, 2022
Technical Report on Subspace Pyramid Fusion Network for Semantic SegmentationMohammed A. M. Elhassan, Chenhui Yang, Chenxi Huang et al.
The following is a technical report to test the validity of the proposed Subspace Pyramid Fusion Module (SPFM) to capture multi-scale feature representations, which is more useful for semantic segmentation. In this investigation, we have proposed the Efficient Shuffle Attention Module(ESAM) to reconstruct the skip-connections paths by fusing multi-level global context features. Experimental results on two well-known semantic segmentation datasets, including Camvid and Cityscapes, show the effectiveness of our proposed method.
NIFeb 17, 2024
Optimizing Wireless Networks with Deep Unfolding: Comparative Study on Two Deep Unfolding MechanismsAbuzar B. M. Adam, Mohammed A. M. Elhassan, Elhadj Moustapha Diallo
In this work, we conduct a comparative study on two deep unfolding mechanisms to efficiently perform power control in the next generation wireless networks. The power control problem is formulated as energy efficiency over multiple interference links. The problem is nonconvex. We employ fractional programming transformation to design two solutions for the problem. The first solution is a numerical solution while the second solution is a closed-form solution. Based on the first solution, we design a semi-unfolding deep learning model where we combine the domain knowledge of the wireless communications and the recent advances in the data-driven deep learning. Moreover, on the highlights of the closed-form solution, fully deep unfolded deep learning model is designed in which we fully leveraged the expressive closed-form power control solution and deep learning advances. In the simulation results, we compare the performance of the proposed deep learning models and the iterative solutions in terms of accuracy and inference speed to show their suitability for the real-time application in next generation networks.