Isah Bello

CV
h-index10
3papers
67citations
Novelty32%
AI Score20

3 Papers

CVJul 10, 2024
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency

Haruna Yunusa, Qin Shiyin, Abdulrahman Hamman Adama Chukkol et al.

The recent emergence of hybrid models has introduced a transformative approach to computer vision, gradually moving beyond conventional convolutional neural net-works and vision transformers. However, efficiently combining these two paradigms to better capture long-range dependencies in complex images remains a challenge. In this paper, we present iiANET (Inception Inspired Attention Network), an efficient hybrid visual backbone designed to improve the modeling of long-range dependen-cies. The core innovation of iiANET is the iiABlock, a unified building block that in-tegrates global r-MHSA (Multi-Head Self-Attention) and convolutional layers in paral-lel. This design enables iiABlock to simultaneously capture global context and local details, making it highly effective for extracting rich and diverse features. By effi-ciently fusing these complementary representations, iiABlock allows iiANET to achieve strong feature interaction while maintaining computational efficiency. Exten-sive qualitative and quantitative evaluations across various benchmarks show im-proved performance over several state-of-the-art models.

CVFeb 5, 2024
Exploring the Synergies of Hybrid CNNs and ViTs Architectures for Computer Vision: A survey

Haruna Yunusa, Shiyin Qin, Abdulrahman Hamman Adama Chukkol et al.

The hybrid of Convolutional Neural Network (CNN) and Vision Transformers (ViT) architectures has emerged as a groundbreaking approach, pushing the boundaries of computer vision (CV). This comprehensive review provides a thorough examination of the literature on state-of-the-art hybrid CNN-ViT architectures, exploring the synergies between these two approaches. The main content of this survey includes: (1) a background on the vanilla CNN and ViT, (2) systematic review of various taxonomic hybrid designs to explore the synergy achieved through merging CNNs and ViTs models, (3) comparative analysis and application task-specific synergy between different hybrid architectures, (4) challenges and future directions for hybrid models, (5) lastly, the survey concludes with a summary of key findings and recommendations. Through this exploration of hybrid CV architectures, the survey aims to serve as a guiding resource, fostering a deeper understanding of the intricate dynamics between CNNs and ViTs and their collective impact on shaping the future of CV architectures.

CVFeb 26, 2024
SaRPFF: A Self-Attention with Register-based Pyramid Feature Fusion module for enhanced RLD detection

Yunusa Haruna, Shiyin Qin, Abdulrahman Hamman Adama Chukkol et al.

Detecting objects across varying scales is still a challenge in computer vision, particularly in agricultural applications like Rice Leaf Disease (RLD) detection, where objects exhibit significant scale variations (SV). Conventional object detection (OD) like Faster R-CNN, SSD, and YOLO methods often fail to effectively address SV, leading to reduced accuracy and missed detections. To tackle this, we propose SaRPFF (Self-Attention with Register-based Pyramid Feature Fusion), a novel module designed to enhance multi-scale object detection. SaRPFF integrates 2D-Multi-Head Self-Attention (MHSA) with Register tokens, improving feature interpretability by mitigating artifacts within MHSA. Additionally, it integrates efficient attention atrous convolutions into the pyramid feature fusion and introduce a deconvolutional layer for refined up-sampling. We evaluate SaRPFF on YOLOv7 using the MRLD and COCO datasets. Our approach demonstrates a +2.61% improvement in Average Precision (AP) on the MRLD dataset compared to the baseline FPN method in YOLOv7. Furthermore, SaRPFF outperforms other FPN variants, including BiFPN, NAS-FPN, and PANET, showcasing its versatility and potential to advance OD techniques. This study highlights SaRPFF effectiveness in addressing SV challenges and its adaptability across FPN-based OD models.