CVJul 10, 2024
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range DependencyHaruna Yunusa, Qin Shiyin, Abdulrahman Hamman Adama Chukkol et al.
The recent emergence of hybrid models has introduced a transformative approach to computer vision, gradually moving beyond conventional convolutional neural net-works and vision transformers. However, efficiently combining these two paradigms to better capture long-range dependencies in complex images remains a challenge. In this paper, we present iiANET (Inception Inspired Attention Network), an efficient hybrid visual backbone designed to improve the modeling of long-range dependen-cies. The core innovation of iiANET is the iiABlock, a unified building block that in-tegrates global r-MHSA (Multi-Head Self-Attention) and convolutional layers in paral-lel. This design enables iiABlock to simultaneously capture global context and local details, making it highly effective for extracting rich and diverse features. By effi-ciently fusing these complementary representations, iiABlock allows iiANET to achieve strong feature interaction while maintaining computational efficiency. Exten-sive qualitative and quantitative evaluations across various benchmarks show im-proved performance over several state-of-the-art models.
CVFeb 5, 2024
Exploring the Synergies of Hybrid CNNs and ViTs Architectures for Computer Vision: A surveyHaruna Yunusa, Shiyin Qin, Abdulrahman Hamman Adama Chukkol et al.
The hybrid of Convolutional Neural Network (CNN) and Vision Transformers (ViT) architectures has emerged as a groundbreaking approach, pushing the boundaries of computer vision (CV). This comprehensive review provides a thorough examination of the literature on state-of-the-art hybrid CNN-ViT architectures, exploring the synergies between these two approaches. The main content of this survey includes: (1) a background on the vanilla CNN and ViT, (2) systematic review of various taxonomic hybrid designs to explore the synergy achieved through merging CNNs and ViTs models, (3) comparative analysis and application task-specific synergy between different hybrid architectures, (4) challenges and future directions for hybrid models, (5) lastly, the survey concludes with a summary of key findings and recommendations. Through this exploration of hybrid CV architectures, the survey aims to serve as a guiding resource, fostering a deeper understanding of the intricate dynamics between CNNs and ViTs and their collective impact on shaping the future of CV architectures.