Abdullah Yahya Abdullah Omaisan

h-index1
2papers

2 Papers

CVOct 15, 2025
InfraGPT Smart Infrastructure: An End-to-End VLM-Based Framework for Detecting and Managing Urban Defects

Ibrahim Sheikh Mohamed, Abdullah Yahya Abdullah Omaisan

Infrastructure in smart cities is increasingly monitored by networks of closed circuit television (CCTV) cameras. Roads, bridges and tunnels develop cracks, potholes, and fluid leaks that threaten public safety and require timely repair. Manual inspection is costly and hazardous, and existing automatic systems typically address individual defect types or provide unstructured outputs that cannot directly guide maintenance crews. This paper proposes a comprehensive pipeline that leverages street CCTV streams for multi defect detection and segmentation using the YOLO family of object detectors and passes the detections to a vision language model (VLM) for scene aware summarization. The VLM generates a structured action plan in JSON format that includes incident descriptions, recommended tools, dimensions, repair plans, and urgent alerts. We review literature on pothole, crack and leak detection, highlight recent advances in large vision language models such as QwenVL and LLaVA, and describe the design of our early prototype. Experimental evaluation on public datasets and captured CCTV clips demonstrates that the system accurately identifies diverse defects and produces coherent summaries. We conclude by discussing challenges and directions for scaling the system to city wide deployments.

RODec 11, 2025
Towards Accessible Physical AI: LoRA-Based Fine-Tuning of VLA Models for Real-World Robot Control

Abdullah Yahya Abdullah Omaisan, Ibrahim Sheikh Mohamed

Vision-Language-Action (VLA) models have demonstrated remarkable capabilities in robotic manipulation,enabling robots to execute natural language commands through end-to-end learning from visual observations.However, deploying large-scale VLA models on affordable robotic platforms remains challenging due to computational constraints and the need for efficient adaptation to new robot embodiments. This paper presents an efficient fine-tuning methodology and real-world deployment analysis for adapting VLA models to low-cost robotic manipulation systems.We propose a resource-efficient fine-tuning strategy using Low-Rank Adaptation (LoRA) and quantization techniques that enable multi-billion parameter VLA models ( 3.1B parameters) to run on consumer-grade GPUs with 8GB VRAM. Our methodology addresses the critical challenge of adapting pre-trained VLA models to new robot embodiments with limited demonstration data, focusing on the trade-offs between frozen and unfrozen vision encoders. Through real-world deployment on the SO101 robotic arm for a button-pressing manipulation task, we demonstrate that our approach achieves effective manipulation performance while maintaining computational efficiency. We provide detailed analysis of deployment challenges, failure modes, and the relationship between training data quantity and real-world performance,trained on 200 demonstration episodes. Our results show that with proper fine-tuning methodology, VLA models can be successfully deployed on affordable robotic platforms,making advanced manipulation capabilities accessible beyond expensive research robots.