CVAILGOct 24, 2021

CvT-ASSD: Convolutional vision-Transformer Based Attentive Single Shot MultiBox Detector

arXiv:2110.12364v116 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses computational inefficiency in object detection for computer vision applications, but it appears incremental as it builds on existing methods like CvT and ASSD.

The paper tackles the challenge of high computational cost and reduced accuracy in Transformer-based object detectors by proposing CvT-ASSD, a hybrid architecture combining Convolutional vision Transformer with Attentive Single Shot MultiBox Detector, achieving good efficiency and performance on datasets like PASCAL VOC and MS COCO.

Due to the success of Bidirectional Encoder Representations from Transformers (BERT) in natural language process (NLP), the multi-head attention transformer has been more and more prevalent in computer-vision researches (CV). However, it still remains a challenge for researchers to put forward complex tasks such as vision detection and semantic segmentation. Although multiple Transformer-Based architectures like DETR and ViT-FRCNN have been proposed to complete object detection task, they inevitably decreases discrimination accuracy and brings down computational efficiency caused by the enormous learning parameters and heavy computational complexity incurred by the traditional self-attention operation. In order to alleviate these issues, we present a novel object detection architecture, named Convolutional vision Transformer Based Attentive Single Shot MultiBox Detector (CvT-ASSD), that built on the top of Convolutional vision Transormer (CvT) with the efficient Attentive Single Shot MultiBox Detector (ASSD). We provide comprehensive empirical evidence showing that our model CvT-ASSD can leads to good system efficiency and performance while being pretrained on large-scale detection datasets such as PASCAL VOC and MS COCO. Code has been released on public github repository at https://github.com/albert-jin/CvT-ASSD.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes