CVOct 16, 2024

Evaluating Cascaded Methods of Vision-Language Models for Zero-Shot Detection and Association of Hardhats for Increased Construction Safety

arXiv:2410.12225v13 citationsh-index: 22024 IEEE MIT Undergraduate Research Technology Conference (URTC)
Originality Synthesis-oriented
AI Analysis

This work addresses construction safety by automating hardhat detection, but it is incremental as it applies existing models to a new domain with dataset creation and cascaded methods.

The paper tackled the problem of detecting hardhats on construction sites to improve safety by evaluating vision-language models, specifically OWLv2, in a zero-shot setting, achieving an average precision of 0.6493 on a dataset of 5,210 images.

This paper evaluates the use of vision-language models (VLMs) for zero-shot detection and association of hardhats to enhance construction safety. Given the significant risk of head injuries in construction, proper enforcement of hardhat use is critical. We investigate the applicability of foundation models, specifically OWLv2, for detecting hardhats in real-world construction site images. Our contributions include the creation of a new benchmark dataset, Hardhat Safety Detection Dataset, by filtering and combining existing datasets and the development of a cascaded detection approach. Experimental results on 5,210 images demonstrate that the OWLv2 model achieves an average precision of 0.6493 for hardhat detection. We further analyze the limitations and potential improvements for real-world applications, highlighting the strengths and weaknesses of current foundation models in safety perception domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes