22.7CVApr 6
Integration of Object Detection and Small VLMs for Construction Safety Hazard IdentificationMuhammad Adil, Mehmood Ahmed, Muhammad Aqib et al.
Accurate and timely identification of construction hazards around workers is essential for preventing workplace accidents. While large vision-language models (VLMs) demonstrate strong contextual reasoning capabilities, their high computational requirements limit their applicability in near real-time construction hazard detection. In contrast, small vision-language models (sVLMs) with fewer than 4 billion parameters offer improved efficiency but often suffer from reduced accuracy and hallucination when analyzing complex construction scenes. To address this trade-off, this study proposes a detection-guided sVLM framework that integrates object detection with multimodal reasoning for contextual hazard identification. The framework first employs a YOLOv11n detector to localize workers and construction machinery within the scene. The detected entities are then embedded into structured prompts to guide the reasoning process of sVLMs, enabling spatially grounded hazard assessment. Within this framework, six sVLMs (Gemma-3 4B, Qwen-3-VL 2B/4B, InternVL-3 1B/2B, and SmolVLM-2B) were evaluated in zero-shot settings on a curated dataset of construction site images with hazard annotations and explanatory rationales. The proposed approach consistently improved hazard detection performance across all models. The best-performing model, Gemma-3 4B, achieved an F1-score of 50.6%, compared to 34.5% in the baseline configuration. Explanation quality also improved significantly, with BERTScore F1 increasing from 0.61 to 0.82. Despite incorporating object detection, the framework introduces minimal overhead, adding only 2.5 ms per image during inference. These results demonstrate that integrating lightweight object detection with small VLM reasoning provides an effective and efficient solution for context-aware construction safety hazard detection.
CVApr 12, 2025
Using Vision Language Models for Safety Hazard Identification in ConstructionMuhammad Adil, Gaang Lee, Vicente A. Gonzalez et al.
Safety hazard identification and prevention are the key elements of proactive safety management. Previous research has extensively explored the applications of computer vision to automatically identify hazards from image clips collected from construction sites. However, these methods struggle to identify context-specific hazards, as they focus on detecting predefined individual entities without understanding their spatial relationships and interactions. Furthermore, their limited adaptability to varying construction site guidelines and conditions hinders their generalization across different projects. These limitations reduce their ability to assess hazards in complex construction environments and adaptability to unseen risks, leading to potential safety gaps. To address these challenges, we proposed and experimentally validated a Vision Language Model (VLM)-based framework for the identification of construction hazards. The framework incorporates a prompt engineering module that structures safety guidelines into contextual queries, allowing VLM to process visual information and generate hazard assessments aligned with the regulation guide. Within this framework, we evaluated state-of-the-art VLMs, including GPT-4o, Gemini, Llama 3.2, and InternVL2, using a custom dataset of 1100 construction site images. Experimental results show that GPT-4o and Gemini 1.5 Pro outperformed alternatives and displayed promising BERTScore of 0.906 and 0.888 respectively, highlighting their ability to identify both general and context-specific hazards. However, processing times remain a significant challenge, impacting real-time feasibility. These findings offer insights into the practical deployment of VLMs for construction site hazard detection, thereby contributing to the enhancement of proactive safety management.
OHDec 11, 2019
Non-linearity identification for construction workers' personality-safety behaviour predictive relationship using neural network and linear regression modellingYifan Gao, Vicente A. Gonzalez, Tak Wing Yiu et al.
The prediction of workers' safety behaviour can help identify vulnerable workers who intend to undertake unsafe behaviours and be useful in the design of management practices to minimise the occurrence of accidents. The latest literature has evidenced that there is within-population diversity that leads people's intended safety behaviours in the workplace, which are found to vary among individuals as a function of their personality traits. In this study, an innovative forecasting model, which employs neural network algorithms, is developed to numerically simulate the predictive relationship between construction workers' personality traits and their intended safety behaviour. The data-driven nature of neural network enabled a reliable estimate of the relationship, which allowed this research to find that a nonlinear effect exists in the relationship. This research has practical implications. The neural network developed is shown to have highly satisfactory prediction accuracy and is thereby potentially useful for assisting project decision-makers to assess how prone workers are to carry out unsafe behaviours in the workplace.