22.8CVApr 6
Integration of Object Detection and Small VLMs for Construction Safety Hazard IdentificationMuhammad Adil, Mehmood Ahmed, Muhammad Aqib et al.
Accurate and timely identification of construction hazards around workers is essential for preventing workplace accidents. While large vision-language models (VLMs) demonstrate strong contextual reasoning capabilities, their high computational requirements limit their applicability in near real-time construction hazard detection. In contrast, small vision-language models (sVLMs) with fewer than 4 billion parameters offer improved efficiency but often suffer from reduced accuracy and hallucination when analyzing complex construction scenes. To address this trade-off, this study proposes a detection-guided sVLM framework that integrates object detection with multimodal reasoning for contextual hazard identification. The framework first employs a YOLOv11n detector to localize workers and construction machinery within the scene. The detected entities are then embedded into structured prompts to guide the reasoning process of sVLMs, enabling spatially grounded hazard assessment. Within this framework, six sVLMs (Gemma-3 4B, Qwen-3-VL 2B/4B, InternVL-3 1B/2B, and SmolVLM-2B) were evaluated in zero-shot settings on a curated dataset of construction site images with hazard annotations and explanatory rationales. The proposed approach consistently improved hazard detection performance across all models. The best-performing model, Gemma-3 4B, achieved an F1-score of 50.6%, compared to 34.5% in the baseline configuration. Explanation quality also improved significantly, with BERTScore F1 increasing from 0.61 to 0.82. Despite incorporating object detection, the framework introduces minimal overhead, adding only 2.5 ms per image during inference. These results demonstrate that integrating lightweight object detection with small VLM reasoning provides an effective and efficient solution for context-aware construction safety hazard detection.
AIJan 15, 2025Code
ANSR-DT: An Adaptive Neuro-Symbolic Learning and Reasoning Framework for Digital TwinsSafayat Bin Hakim, Muhammad Adil, Alvaro Velasquez et al.
In this paper, we propose an Adaptive Neuro-Symbolic Learning and Reasoning Framework for digital twin technology called ``ANSR-DT." Digital twins in industrial environments often struggle with interpretability, real-time adaptation, and human input integration. Our approach addresses these challenges by combining CNN-LSTM dynamic event detection with reinforcement learning and symbolic reasoning to enable adaptive intelligence with interpretable decision processes. This integration enhances environmental understanding while promoting continuous learning, leading to more effective real-time decision-making in human-machine collaborative applications. We evaluated ANSR-DT on synthetic industrial data, observing significant improvements over traditional approaches, with up to 99.5% accuracy for dynamic pattern recognition. The framework demonstrated superior adaptability with extended reinforcement learning training, improving explained variance from 0.447 to 0.547. Future work aims at scaling to larger datasets to test rule management beyond the current 14 rules. Our open-source implementation promotes reproducibility and establishes a foundation for future research in adaptive, interpretable digital twins for industrial applications.
CRSep 28, 2024
Decoding Android Malware with a Fraction of Features: An Attention-Enhanced MLP-SVM ApproachSafayat Bin Hakim, Muhammad Adil, Kamal Acharya et al.
The escalating sophistication of Android malware poses significant challenges to traditional detection methods, necessitating innovative approaches that can efficiently identify and classify threats with high precision. This paper introduces a novel framework that synergistically integrates an attention-enhanced Multi-Layer Perceptron (MLP) with a Support Vector Machine (SVM) to make Android malware detection and classification more effective. By carefully analyzing a mere 47 features out of over 9,760 available in the comprehensive CCCS-CIC-AndMal-2020 dataset, our MLP-SVM model achieves an impressive accuracy over 99% in identifying malicious applications. The MLP, enhanced with an attention mechanism, focuses on the most discriminative features and further reduces the 47 features to only 14 components using Linear Discriminant Analysis (LDA). Despite this significant reduction in dimensionality, the SVM component, equipped with an RBF kernel, excels in mapping these components to a high-dimensional space, facilitating precise classification of malware into their respective families. Rigorous evaluations, encompassing accuracy, precision, recall, and F1-score metrics, confirm the superiority of our approach compared to existing state-of-the-art techniques. The proposed framework not only significantly reduces the computational complexity by leveraging a compact feature set but also exhibits resilience against the evolving Android malware landscape.
CVApr 12, 2025
Using Vision Language Models for Safety Hazard Identification in ConstructionMuhammad Adil, Gaang Lee, Vicente A. Gonzalez et al.
Safety hazard identification and prevention are the key elements of proactive safety management. Previous research has extensively explored the applications of computer vision to automatically identify hazards from image clips collected from construction sites. However, these methods struggle to identify context-specific hazards, as they focus on detecting predefined individual entities without understanding their spatial relationships and interactions. Furthermore, their limited adaptability to varying construction site guidelines and conditions hinders their generalization across different projects. These limitations reduce their ability to assess hazards in complex construction environments and adaptability to unseen risks, leading to potential safety gaps. To address these challenges, we proposed and experimentally validated a Vision Language Model (VLM)-based framework for the identification of construction hazards. The framework incorporates a prompt engineering module that structures safety guidelines into contextual queries, allowing VLM to process visual information and generate hazard assessments aligned with the regulation guide. Within this framework, we evaluated state-of-the-art VLMs, including GPT-4o, Gemini, Llama 3.2, and InternVL2, using a custom dataset of 1100 construction site images. Experimental results show that GPT-4o and Gemini 1.5 Pro outperformed alternatives and displayed promising BERTScore of 0.906 and 0.888 respectively, highlighting their ability to identify both general and context-specific hazards. However, processing times remain a significant challenge, impacting real-time feasibility. These findings offer insights into the practical deployment of VLMs for construction site hazard detection, thereby contributing to the enhancement of proactive safety management.
CRSep 8, 2025
Neuro-Symbolic AI for Cybersecurity: State of the Art, Challenges, and OpportunitiesSafayat Bin Hakim, Muhammad Adil, Alvaro Velasquez et al.
Traditional Artificial Intelligence (AI) approaches in cybersecurity exhibit fundamental limitations: inadequate conceptual grounding leading to non-robustness against novel attacks; limited instructibility impeding analyst-guided adaptation; and misalignment with cybersecurity objectives. Neuro-Symbolic (NeSy) AI has emerged with the potential to revolutionize cybersecurity AI. However, there is no systematic understanding of this emerging approach. These hybrid systems address critical cybersecurity challenges by combining neural pattern recognition with symbolic reasoning, enabling enhanced threat understanding while introducing concerning autonomous offensive capabilities that reshape threat landscapes. In this survey, we systematically characterize this field by analyzing 127 publications spanning 2019-July 2025. We introduce a Grounding-Instructibility-Alignment (G-I-A) framework to evaluate these systems, focusing on both cyber defense and cyber offense across network security, malware analysis, and cyber operations. Our analysis shows advantages of multi-agent NeSy architectures and identifies critical implementation challenges including standardization gaps, computational complexity, and human-AI collaboration requirements that constrain deployment. We show that causal reasoning integration is the most transformative advancement, enabling proactive defense beyond correlation-based approaches. Our findings highlight dual-use implications where autonomous systems demonstrate substantial capabilities in zero-day exploitation while achieving significant cost reductions, altering threat dynamics. We provide insights and future research directions, emphasizing the urgent need for community-driven standardization frameworks and responsible development practices that ensure advancement serves defensive cybersecurity objectives while maintaining societal alignment.
AIJun 15, 2025
SymRAG: Efficient Neuro-Symbolic Retrieval Through Adaptive Query RoutingSafayat Bin Hakim, Muhammad Adil, Alvaro Velasquez et al.
Current Retrieval-Augmented Generation systems use uniform processing, causing inefficiency as simple queries consume resources similar to complex multi-hop tasks. We present SymRAG, a framework that introduces adaptive query routing via real-time complexity and load assessment to select symbolic, neural, or hybrid pathways. SymRAG's neuro-symbolic approach adjusts computational pathways based on both query characteristics and system load, enabling efficient resource allocation across diverse query types. By combining linguistic and structural query properties with system load metrics, SymRAG allocates resources proportional to reasoning requirements. Evaluated on 2,000 queries across HotpotQA (multi-hop reasoning) and DROP (discrete reasoning) using Llama-3.2-3B and Mistral-7B models, SymRAG achieves competitive accuracy (97.6--100.0% exact match) with efficient resource utilization (3.6--6.2% CPU utilization, 0.985--3.165s processing). Disabling adaptive routing increases processing time by 169--1151%, showing its significance for complex models. These results suggest adaptive computation strategies are more sustainable and scalable for hybrid AI systems that use dynamic routing and neuro-symbolic frameworks.
ROJan 28, 2022
Machine Learning Based Relative Orbit Transfer for Swarm Spacecraft Motion PlanningAlex Sabol, Kyongsik Yun, Muhammad Adil et al.
In this paper we describe a machine learning based framework for spacecraft swarm trajectory planning. In particular, we focus on coordinating motions of multi-spacecraft in formation flying through passive relative orbit(PRO) transfers. Accounting for spacecraft dynamics while avoiding collisions between the agents makes spacecraft swarm trajectory planning difficult. Centralized approaches can be used to solve this problem, but are computationally demanding and scale poorly with the number of agents in the swarm. As a result, centralized algorithms are ill-suited for real time trajectory planning on board small spacecraft (e.g. CubeSats) comprising the swarm. In our approach a neural network is used to approximate solutions of a centralized method. The necessary training data is generated using a centralized convex optimization framework through which several instances of the n=10 spacecraft swarm trajectory planning problem are solved. We are interested in answering the following questions which will give insight on the potential utility of deep learning-based approaches to the multi-spacecraft motion planning problem: 1) Can neural networks produce feasible trajectories that satisfy safety constraints (e.g. collision avoidance) and low in fuel cost? 2) Can a neural network trained using n spacecraft data be used to solve problems for spacecraft swarms of differing size?
RONov 28, 2021
Optimal Multi-Robot Motion Planning via Parabolic RelaxationChangrak Choi, Muhammad Adil, Amir Rahmani et al.
Multi-robot systems offer enhanced capability over their monolithic counterparts, but they come at a cost of increased complexity in coordination. To reduce complexity and to make the problem tractable, multi-robot motion planning (MRMP) methods in the literature adopt de-coupled approaches that sacrifice either optimality or dynamic feasibility. In this paper, we present a convexification method, namely "parabolic relaxation", to generate optimal and dynamically feasible trajectories for MRMP in the coupled joint-space of all robots. We leverage upon the proposed relaxation to tackle the problem complexity and to attain computational tractability for planning over one hundred robots in extremely clustered environments. We take a multi-stage optimization approach that consists of i) mathematically formulating MRMP as a non-convex optimization, ii) lifting the problem into a higher dimensional space, iii) convexifying the problem through the proposed computationally efficient parabolic relaxation, and iv) penalizing with iterative search to ensure feasibility and recovery of feasible and near-optimal solutions to the original problem. Our numerical experiments demonstrate that the proposed approach is capable of generating optimal and dynamically feasible trajectories for challenging motion planning problems with higher success rate than the state-of-the-art, yet remain computationally tractable for over one hundred robots in a highly dense environment.
CRMay 17, 2021
Hash-MAC-DSDV: Mutual Authentication for Intelligent IoT-Based Cyber-Physical SystemsMuhammad Adil, Mian Ahmad Jan, Spyridon Mastorakis et al.
Cyber-Physical Systems (CPS) connected in the form of Internet of Things (IoT) are vulnerable to various security threats, due to the infrastructure-less deployment of IoT devices. Device-to-Device (D2D) authentication of these networks ensures the integrity, authenticity, and confidentiality of information in the deployed area. The literature suggests different approaches to address security issues in CPS technologies. However, they are mostly based on centralized techniques or specific system deployments with higher cost of computation and communication. It is therefore necessary to develop an effective scheme that can resolve the security problems in CPS technologies of IoT devices. In this paper, a lightweight Hash-MAC-DSDV (Hash Media Access Control Destination Sequence Distance Vector) routing scheme is proposed to resolve authentication issues in CPS technologies, connected in the form of IoT networks. For this purpose, a CPS of IoT devices (multi-WSNs) is developed from the local-chain and public chain, respectively. The proposed scheme ensures D2D authentication by the Hash-MAC-DSDV mutual scheme, where the MAC addresses of individual devices are registered in the first phase and advertised in the network in the second phase. The proposed scheme allows legitimate devices to modify their routing table and unicast the one-way hash authentication mechanism to transfer their captured data from source towards the destination. Our evaluation results demonstrate that Hash- MAC-DSDV outweighs the existing schemes in terms of attack detection, energy consumption and communication metrics.
CRApr 30, 2021
LightIoT: Lightweight and Secure Communication for Energy-Efficient IoT in Health InformaticsMian Ahmad Jan, Fazlullah Khan, Spyridon Mastorakis et al.
Internet of Things (IoT) is considered as a key enabler of health informatics. IoT-enabled devices are used for in-hospital and in-home patient monitoring to collect and transfer biomedical data pertaining to blood pressure, electrocardiography (ECG), blood sugar levels, body temperature, etc. Among these devices, wearables have found their presence in a wide range of healthcare applications. These devices generate data in real-time and transmit them to nearby gateways and remote servers for processing and visualization. The data transmitted by these devices are vulnerable to a range of adversarial threats, and as such, privacy and integrity need to be preserved. In this paper, we present LightIoT, a lightweight and secure communication approach for data exchanged among the devices of a healthcare infrastructure. LightIoT operates in three phases: initialization, pairing, and authentication. These phases ensure the reliable transmission of data by establishing secure sessions among the communicating entities (wearables, gateways and a remote server). Statistical results exhibit that our scheme is lightweight, robust, and resilient against a wide range of adversarial attacks and incurs much lower computational and communication overhead for the transmitted data in the presence of existing approaches.
CVApr 21, 2021
Rapid Detection of Aircrafts in Satellite Imagery based on Deep Neural NetworksArsalan Tahir, Muhammad Adil, Arslan Ali
Object detection is one of the fundamental objectives in Applied Computer Vision. In some of the applications, object detection becomes very challenging such as in the case of satellite image processing. Satellite image processing has remained the focus of researchers in domains of Precision Agriculture, Climate Change, Disaster Management, etc. Therefore, object detection in satellite imagery is one of the most researched problems in this domain. This paper focuses on aircraft detection. in satellite imagery using deep learning techniques. In this paper, we used YOLO deep learning framework for aircraft detection. This method uses satellite images collected by different sources as learning for the model to perform detection. Object detection in satellite images is mostly complex because objects have many variations, types, poses, sizes, complex and dense background. YOLO has some limitations for small size objects (less than$\sim$32 pixels per object), therefore we upsample the prediction grid to reduce the coarseness of the model and to accurately detect the densely clustered objects. The improved model shows good accuracy and performance on different unknown images having small, rotating, and dense objects to meet the requirements in real-time.
ROOct 15, 2020
Multi-Agent Motion Planning using Deep Learning for Space ApplicationsKyongsik Yun, Changrak Choi, Ryan Alimo et al.
State-of-the-art motion planners cannot scale to a large number of systems. Motion planning for multiple agents is an NP (non-deterministic polynomial-time) hard problem, so the computation time increases exponentially with each addition of agents. This computational demand is a major stumbling block to the motion planner's application to future NASA missions involving the swarm of space vehicles. We applied a deep neural network to transform computationally demanding mathematical motion planning problems into deep learning-based numerical problems. We showed optimal motion trajectories can be accurately replicated using deep learning-based numerical models in several 2D and 3D systems with multiple agents. The deep learning-based numerical model demonstrates superior computational efficiency with plans generated 1000 times faster than the mathematical model counterpart.