CVROJan 16, 2025

Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites

arXiv:2501.09267v21 citationsh-index: 12Proceedings of the International Symposium on Automation and Robotics in Construction (IAARC)
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of automating MEP monitoring for the construction industry, but it is incremental as it compares existing methods rather than introducing new ones.

The study tackled the problem of detecting mechanical, electrical, and plumbing (MEP) elements on construction sites using computer vision, finding that fine-tuned lightweight models outperformed open-vocabulary vision-language models in this specialized domain.

The construction industry has long explored robotics and computer vision, yet their deployment on construction sites remains very limited. These technologies have the potential to revolutionize traditional workflows by enhancing accuracy, efficiency, and safety in construction management. Ground robots equipped with advanced vision systems could automate tasks such as monitoring mechanical, electrical, and plumbing (MEP) systems. The present research evaluates the applicability of open-vocabulary vision-language models compared to fine-tuned, lightweight, closed-set object detectors for detecting MEP components using a mobile ground robotic platform. A dataset collected with cameras mounted on a ground robot was manually annotated and analyzed to compare model performance. The results demonstrate that, despite the versatility of vision-language models, fine-tuned lightweight models still largely outperform them in specialized environments and for domain-specific tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes