Gaang Lee

CV
h-index5
3papers
14citations
Novelty42%
AI Score36

3 Papers

CVApr 6
Integration of Object Detection and Small VLMs for Construction Safety Hazard Identification

Muhammad Adil, Mehmood Ahmed, Muhammad Aqib et al.

Accurate and timely identification of construction hazards around workers is essential for preventing workplace accidents. While large vision-language models (VLMs) demonstrate strong contextual reasoning capabilities, their high computational requirements limit their applicability in near real-time construction hazard detection. In contrast, small vision-language models (sVLMs) with fewer than 4 billion parameters offer improved efficiency but often suffer from reduced accuracy and hallucination when analyzing complex construction scenes. To address this trade-off, this study proposes a detection-guided sVLM framework that integrates object detection with multimodal reasoning for contextual hazard identification. The framework first employs a YOLOv11n detector to localize workers and construction machinery within the scene. The detected entities are then embedded into structured prompts to guide the reasoning process of sVLMs, enabling spatially grounded hazard assessment. Within this framework, six sVLMs (Gemma-3 4B, Qwen-3-VL 2B/4B, InternVL-3 1B/2B, and SmolVLM-2B) were evaluated in zero-shot settings on a curated dataset of construction site images with hazard annotations and explanatory rationales. The proposed approach consistently improved hazard detection performance across all models. The best-performing model, Gemma-3 4B, achieved an F1-score of 50.6%, compared to 34.5% in the baseline configuration. Explanation quality also improved significantly, with BERTScore F1 increasing from 0.61 to 0.82. Despite incorporating object detection, the framework introduces minimal overhead, adding only 2.5 ms per image during inference. These results demonstrate that integrating lightweight object detection with small VLM reasoning provides an effective and efficient solution for context-aware construction safety hazard detection.

CVApr 12, 2025
Using Vision Language Models for Safety Hazard Identification in Construction

Muhammad Adil, Gaang Lee, Vicente A. Gonzalez et al.

Safety hazard identification and prevention are the key elements of proactive safety management. Previous research has extensively explored the applications of computer vision to automatically identify hazards from image clips collected from construction sites. However, these methods struggle to identify context-specific hazards, as they focus on detecting predefined individual entities without understanding their spatial relationships and interactions. Furthermore, their limited adaptability to varying construction site guidelines and conditions hinders their generalization across different projects. These limitations reduce their ability to assess hazards in complex construction environments and adaptability to unseen risks, leading to potential safety gaps. To address these challenges, we proposed and experimentally validated a Vision Language Model (VLM)-based framework for the identification of construction hazards. The framework incorporates a prompt engineering module that structures safety guidelines into contextual queries, allowing VLM to process visual information and generate hazard assessments aligned with the regulation guide. Within this framework, we evaluated state-of-the-art VLMs, including GPT-4o, Gemini, Llama 3.2, and InternVL2, using a custom dataset of 1100 construction site images. Experimental results show that GPT-4o and Gemini 1.5 Pro outperformed alternatives and displayed promising BERTScore of 0.906 and 0.888 respectively, highlighting their ability to identify both general and context-specific hazards. However, processing times remain a significant challenge, impacting real-time feasibility. These findings offer insights into the practical deployment of VLMs for construction site hazard detection, thereby contributing to the enhancement of proactive safety management.

SPAug 14, 2019
Assessing Workers Perceived Risk During Construction Task Using A Wristband-Type Biosensor

Byungjoo Choi, Gaang Lee, Houtan Jebelli et al.

The construction industry has demonstrated a high frequency and severity of accidents. Construction accidents are the result of the interaction between unsafe work conditions and workers unsafe behaviors. Given this relation, perceived risk is determined by an individual response to a potential work hazard during the work. As such, risk perception is critical to understand workers unsafe behaviors. Established methods of assessing workers perceived risk have mainly relied on surveys and interviews. However, these post-hoc methods, which are limited to monitoring dynamic changes in risk perception and conducting surveys at a construction site, may prove cumbersome to workers. Additionally, these methods frequently suffer from self-reported bias. To overcome the limitations of previous subjective measures, this study aims to develop a framework for the objective and continuous prediction of construction workers perceived risk using physiological signals [e.g., electrodermal activity (EDA)] acquired from workers wristband-type biosensors. To achieve this objective, physiological signals were collected from eight construction workers while they performed regular tasks in the field. Various filtering methods were applied to exclude noises recorded in the signal and to extract various features of the signals as workers experienced different risk levels. Then, a supervised machine-learning model was trained to explore the applicability of the collected physiological signals for the prediction of risk perception. The results showed that features based on EDA data collected from wristbands are feasible and useful to the process of continuously monitoring workers perceived risk during ongoing work. This study contributes to an in-depth understanding of construction workers perceived risk by developing a noninvasive means of continuously monitoring workers perceived risk.