CV CLMar 11, 2024

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

Tiancheng Zhao, Peng Liu, Xuan He, Lu Zhang, Kyusong Lee

CMU

arXiv:2403.06892v215.323 citationsh-index: 21Has Code

Originality Incremental advance

AI Analysis

This work addresses real-time object detection for industrial applications by improving speed while maintaining accuracy, though it is incremental as it builds on existing transformer-based detectors.

The paper tackles the problem of high computational demands in transformer-based open-vocabulary object detection, introducing OmDet-Turbo with an Efficient Fusion Head to achieve real-time performance, achieving 100.2 FPS and competitive accuracy on benchmarks like COCO, LVIS, ODinW, and OVDEval.

End-to-end transformer-based detectors (DETRs) have shown exceptional performance in both closed-set and open-vocabulary object detection (OVD) tasks through the integration of language modalities. However, their demanding computational requirements have hindered their practical application in real-time object detection (OD) scenarios. In this paper, we scrutinize the limitations of two leading models in the OVDEval benchmark, OmDet and Grounding-DINO, and introduce OmDet-Turbo. This novel transformer-based real-time OVD model features an innovative Efficient Fusion Head (EFH) module designed to alleviate the bottlenecks observed in OmDet and Grounding-DINO. Notably, OmDet-Turbo-Base achieves a 100.2 frames per second (FPS) with TensorRT and language cache techniques applied. Notably, in zero-shot scenarios on COCO and LVIS datasets, OmDet-Turbo achieves performance levels nearly on par with current state-of-the-art supervised models. Furthermore, it establishes new state-of-the-art benchmarks on ODinW and OVDEval, boasting an AP of 30.1 and an NMS-AP of 26.86, respectively. The practicality of OmDet-Turbo in industrial applications is underscored by its exceptional performance on benchmark datasets and superior inference speed, positioning it as a compelling choice for real-time object detection tasks. Code: \url{https://github.com/om-ai-lab/OmDet}

View on arXiv PDF Code

Similar