Walid Gomaa

CV
h-index1
15papers
114citations
Novelty37%
AI Score42

15 Papers

CVSep 13, 2024Code
Precision Aquaculture: An Integrated Computer Vision and IoT Approach for Optimized Tilapia Feeding

Rania Hossam, Ahmed Heakl, Walid Gomaa

Traditional fish farming practices often lead to inefficient feeding, resulting in environmental issues and reduced productivity. We developed an innovative system combining computer vision and IoT technologies for precise Tilapia feeding. Our solution uses real-time IoT sensors to monitor water quality parameters and computer vision algorithms to analyze fish size and count, determining optimal feed amounts. A mobile app enables remote monitoring and control. We utilized YOLOv8 for keypoint detection to measure Tilapia weight from length, achieving \textbf{94\%} precision on 3,500 annotated images. Pixel-based measurements were converted to centimeters using depth estimation for accurate feeding calculations. Our method, with data collection mirroring inference conditions, significantly improved results. Preliminary estimates suggest this approach could increase production up to 58 times compared to traditional farms. Our models, code, and dataset are open-source~\footnote{The code, dataset, and models are available upon reasonable request.

CCJan 17, 2017
On the complexity of bounded time and precision reachability for piecewise affine systems

Hugo Bazille, Olivier Bournez, Walid Gomaa et al.

Reachability for piecewise affine systems is known to be undecidable, starting from dimension $2$. In this paper we investigate the exact complexity of several decidable variants of reachability and control questions for piecewise affine systems. We show in particular that the region to region bounded time versions leads to $NP$-complete or co-$NP$-complete problems, starting from dimension $2$. We also prove that a bounded precision version leads to $PSPACE$-complete problems.

CLJun 26, 2024Code
ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs

Ahmed Heakl, Youssef Zaghloul, Mennatullah Ali et al.

Motivated by the widespread increase in the phenomenon of code-switching between Egyptian Arabic and English in recent times, this paper explores the intricacies of machine translation (MT) and automatic speech recognition (ASR) systems, focusing on translating code-switched Egyptian Arabic-English to either English or Egyptian Arabic. Our goal is to present the methodologies employed in developing these systems, utilizing large language models such as LLama and Gemma. In the field of ASR, we explore the utilization of the Whisper model for code-switched Egyptian Arabic recognition, detailing our experimental procedures including data preprocessing and training techniques. Through the implementation of a consecutive speech-to-text translation system that integrates ASR with MT, we aim to overcome challenges posed by limited resources and the unique characteristics of the Egyptian Arabic dialect. Evaluation against established metrics showcases promising results, with our methodologies yielding a significant improvement of $56\%$ in English translation over the state-of-the-art and $9.3\%$ in Arabic translation. Since code-switching is deeply inherent in spoken languages, it is crucial that ASR systems can effectively handle this phenomenon. This capability is crucial for enabling seamless interaction in various domains, including business negotiations, cultural exchanges, and academic discourse. Our models and code are available as open-source resources. Code: \url{http://github.com/ahmedheakl/arazn-llm}}, Models: \url{http://huggingface.co/collections/ahmedheakl/arazn-llm-662ceaf12777656607b9524e}.

CVJun 1, 2024Code
DroneVis: Versatile Computer Vision Library for Drones

Ahmed Heakl, Fatma Youssef, Victor Parque et al.

This paper introduces DroneVis, a novel library designed to automate computer vision algorithms on Parrot drones. DroneVis offers a versatile set of features and provides a diverse range of computer vision tasks along with a variety of models to choose from. Implemented in Python, the library adheres to high-quality code standards, facilitating effortless customization and feature expansion according to user requirements. In addition, comprehensive documentation is provided, encompassing usage guidelines and illustrative use cases. Our documentation, code, and examples are available in https://github.com/ahmedheakl/drone-vis.

SIMar 24
Network Analysis of the Egyptian Reddit Community

Samy Shaawat, Adham Hammad, Karim Farhat et al.

This paper presents a network analysis of the Reddit community focused on Egypt. We collected and constructed a comprehensive dataset consisting of 23,185 users and 105 Egyptian subreddits. Through network analysis criteria such as degree analysis, degree distribution analysis, and clustering coefficient analysis, we explored the structural properties, connectivity patterns, and local clustering within the Egyptian Reddit network. The findings provide insights into the community dynamics, influential users, and information flow within the network. Our study contributes to a better understanding of online communities in the context of Egypt and sheds light on the relationships and interactions within the Egyptian Reddit community. By leveraging network analysis techniques, we uncover the importance of individual nodes, the distribution of node degrees, and the formation of tightly knit groups. This study contributes significantly to the understanding of online communities specific to Egypt, shedding light on relationships and interactions within the Egyptian Reddit community.

AIJan 11, 2025
Guided Code Generation with LLMs: A Multi-Agent Framework for Complex Code Tasks

Amr Almorsi, Mohanned Ahmed, Walid Gomaa

Large Language Models (LLMs) have shown remarkable capabilities in code generation tasks, yet they face significant limitations in handling complex, long-context programming challenges and demonstrating complex compositional reasoning abilities. This paper introduces a novel agentic framework for ``guided code generation'' that tries to address these limitations through a deliberately structured, fine-grained approach to code generation tasks. Our framework leverages LLMs' strengths as fuzzy searchers and approximate information retrievers while mitigating their weaknesses in long sequential reasoning and long-context understanding. Empirical evaluation using OpenAI's HumanEval benchmark with Meta's Llama 3.1 8B model (int4 precision) demonstrates a 23.79\% improvement in solution accuracy compared to direct one-shot generation. Our results indicate that structured, guided approaches to code generation can significantly enhance the practical utility of LLMs in software development while overcoming their inherent limitations in compositional reasoning and context handling.

CVFeb 7, 2025
Invizo: Arabic Handwritten Document Optical Character Recognition Solution

Alhossien Waly, Bassant Tarek, Ali Feteha et al.

Converting images of Arabic text into plain text is a widely researched topic in academia and industry. However, recognition of Arabic handwritten and printed text presents difficult challenges due to the complex nature of variations of the Arabic script. This work proposes an end-to-end solution for recognizing Arabic handwritten, printed, and Arabic numbers and presents the data in a structured manner. We reached 81.66% precision, 78.82% Recall, and 79.07% F-measure on a Text Detection task that powers the proposed solution. The proposed recognition model incorporates state-of-the-art CNN-based feature extraction, and Transformer-based sequence modeling to accommodate variations in handwriting styles, stroke thicknesses, alignments, and noise conditions. The evaluation of the model suggests its strong performances on both printed and handwritten texts, yielding 0.59% CER and & 1.72% WER on printed text, and 7.91% CER and 31.41% WER on handwritten text. The overall proposed solution has proven to be relied on in real-life OCR tasks. Equipped with both detection and recognition models as well as other Feature Extraction and Matching helping algorithms. With the general purpose implementation, making the solution valid for any given document or receipt that is Arabic handwritten or printed. Thus, it is practical and useful for any given context.

SDOct 25, 2024
Arabic Music Classification and Generation using Deep Learning

Mohamed Elshaarawy, Ashrakat Saeed, Mariam Sheta et al.

This paper proposes a machine learning approach for classifying classical and new Egyptian music by composer and generating new similar music. The proposed system utilizes a convolutional neural network (CNN) for classification and a CNN autoencoder for generation. The dataset used in this project consists of new and classical Egyptian music pieces composed by different composers. To classify the music by composer, each sample is normalized and transformed into a mel spectrogram. The CNN model is trained on the dataset using the mel spectrograms as input features and the composer labels as output classes. The model achieves 81.4\% accuracy in classifying the music by composer, demonstrating the effectiveness of the proposed approach. To generate new music similar to the original pieces, a CNN autoencoder is trained on a similar dataset. The model is trained to encode the mel spectrograms of the original pieces into a lower-dimensional latent space and then decode them back into the original mel spectrogram. The generated music is produced by sampling from the latent space and decoding the samples back into mel spectrograms, which are then transformed into audio. In conclusion, the proposed system provides a promising approach to classifying and generating classical Egyptian music, which can be applied in various musical applications, such as music recommendation systems, music production, and music education.

CVDec 2, 2025
Defense That Attacks: How Robust Models Become Better Attackers

Mohamed Awad, Mahmoud Akrm, Walid Gomaa

Deep learning has achieved great success in computer vision, but remains vulnerable to adversarial attacks. Adversarial training is the leading defense designed to improve model robustness. However, its effect on the transferability of attacks is underexplored. In this work, we ask whether adversarial training unintentionally increases the transferability of adversarial examples. To answer this, we trained a diverse zoo of 36 models, including CNNs and ViTs, and conducted comprehensive transferability experiments. Our results reveal a clear paradox: adversarially trained (AT) models produce perturbations that transfer more effectively than those from standard models, which introduce a new ecosystem risk. To enable reproducibility and further study, we release all models, code, and experimental scripts. Furthermore, we argue that robustness evaluations should assess not only the resistance of a model to transferred attacks but also its propensity to produce transferable adversarial examples.

LGAug 29, 2021
Markov Switching Model for Driver Behavior Prediction: Use cases on Smartphones

Ahmed B. Zaky, Mohamed A. Khamis, Walid Gomaa

Several intelligent transportation systems focus on studying the various driver behaviors for numerous objectives. This includes the ability to analyze driver actions, sensitivity, distraction, and response time. As the data collection is one of the major concerns for learning and validating different driving situations, we present a driver behavior switching model validated by a low-cost data collection solution using smartphones. The proposed model is validated using a real dataset to predict the driver behavior in short duration periods. A literature survey on motion detection (specifically driving behavior detection using smartphones) is presented. Multiple Markov Switching Variable Auto-Regression (MSVAR) models are implemented to achieve a sophisticated fitting with the collected driver behavior data. This yields more accurate predictions not only for driver behavior but also for the entire driving situation. The performance of the presented models together with a suitable model selection criteria is also presented. The proposed driver behavior prediction framework can potentially be used in accident prediction and driver safety systems.

LGApr 21, 2021
Bearings Fault Detection Using Hidden Markov Models and Principal Component Analysis Enhanced Features

Akthem Rehab, Islam Ali, Walid Gomaa et al.

Asset health monitoring continues to be of increasing importance on productivity, reliability, and cost reduction. Early Fault detection is a keystone of health management as part of the emerging Prognostics and Health Management (PHM) philosophy. This paper proposes a Hidden Markov Model (HMM) to assess the machine health degradation. using Principal Component Analysis (PCA) to enhance features extracted from vibration signals is considered. The enhanced features capture the second order structure of the data. The experimental results based on a bearing test bed show the plausibility of the proposed method.

NIJul 7, 2020
CrossCount: A Deep Learning System for Device-free Human Counting using WiFi

Osama T. Ibrahim, Walid Gomaa, Moustafa Youssef

Counting humans is an essential part of many people-centric applications. In this paper, we propose CrossCount: an accurate deep-learning-based human count estimator that uses a single WiFi link to estimate the human count in an area of interest. The main idea is to depend on the temporal link-blockage pattern as a discriminant feature that is more robust to wireless channel noise than the signal strength, hence delivering a ubiquitous and accurate human counting system. As part of its design, CrossCount addresses a number of deep learning challenges such as class imbalance and training data augmentation for enhancing the model generalizability. Implementation and evaluation of CrossCount in multiple testbeds show that it can achieve a human counting accuracy to within a maximum of 2 persons 100% of the time. This highlights the promise of CrossCount as a ubiquitous crowd estimator with non-labour-intensive data collection from off-the-shelf devices.

CYJun 14, 2019
Trans-Sense: Real Time Transportation Schedule Estimation Using Smart Phones

Ali AbdelAziz, Amin Shoukry, Walid Gomaa et al.

Developing countries suffer from traffic congestion, poorly planned road/rail networks, and lack of access to public transportation facilities. This context results in an increase in fuel consumption, pollution level, monetary losses, massive delays, and less productivity. On the other hand, it has a negative impact on the commuters feelings and moods. Availability of real-time transit information - by providing public transportation vehicles locations using GPS devices - helps in estimating a passenger's waiting time and addressing the above issues. However, such solution is expensive for developing countries. This paper aims at designing and implementing a crowd-sourced mobile phones-based solution to estimate the expected waiting time of a passenger in public transit systems, the prediction of the remaining time to get on/off a vehicle, and to construct a real time public transit schedule. Trans-Sense has been evaluated using real data collected for over 800 hours, on a daily basis, by different Android phones, and using different light rail transit lines at different time spans. The results show that Trans-Sense can achieve an average recall and precision of 95.35% and 90.1%, respectively, in discriminating lightrail stations. Moreover, the empirical distributions governing the different time delays affecting a passenger's total trip time enable predicting the right time of arrival of a passenger to her destination with an accuracy of 91.81%.In addition, the system estimates the stations dimensions with an accuracy of 95.71%.

SYAug 27, 2018
MARL-FWC: Optimal Coordination of Freeway Traffic Control Measures

Ahmed Fares, Walid Gomaa, Mohamed A. Khamis

The objective of this article is to optimize the overall traffic flow on freeways using multiple ramp metering controls plus its complementary Dynamic Speed Limits (DSLs). An optimal freeway operation can be reached when minimizing the difference between the freeway density and the critical ratio for maximum traffic flow. In this article, a Multi-Agent Reinforcement Learning for Freeways Control (MARL-FWC) system for ramps metering and DSLs is proposed. MARL-FWC introduces a new microscopic framework at the network level based on collaborative Markov Decision Process modeling (Markov game) and an associated cooperative Q-learning algorithm. The technique incorporates payoff propagation (Max-Plus algorithm) under the coordination graphs framework, particularly suited for optimal control purposes. MARL-FWC provides three control designs: fully independent, fully distributed, and centralized; suited for different network architectures. MARL-FWC was extensively tested in order to assess the proposed model of the joint payoff, as well as the global payoff. Experiments are conducted with heavy traffic flow under the renowned VISSIM traffic simulator to evaluate MARL-FWC. The experimental results show a significant decrease in the total travel time and an increase in the average speed (when compared with the base case) while maintaining an optimal traffic flow.

BMAug 23, 2016
Deep learning is competing random forest in computational docking

Mohamed Khamis, Walid Gomaa, Basem Galal

Computational docking is the core process of computer-aided drug design; it aims at predicting the best orientation and conformation of a small drug molecule when bound to a target large protein receptor. The docking quality is typically measured by a scoring function: a mathematical predictive model that produces a score representing the binding free energy and hence the stability of the resulting complex molecule. We analyze the performance of both learning techniques on the scoring power, the ranking power, docking power, and screening power using the PDBbind 2013 database. For the scoring and ranking powers, the proposed learning scoring functions depend on a wide range of features (energy terms, pharmacophore, intermolecular) that entirely characterize the protein-ligand complexes. For the docking and screening powers, the proposed learning scoring functions depend on the intermolecular features of the RF-Score to utilize a larger number of training complexes. For the scoring power, the DL\_RF scoring function achieves Pearson's correlation coefficient between the predicted and experimentally measured binding affinities of 0.799 versus 0.758 of the RF scoring function. For the ranking power, the DL scoring function ranks the ligands bound to fixed target protein with accuracy 54% for the high-level ranking and with accuracy 78% for the low-level ranking while the RF scoring function achieves (46% and 62%) respectively. For the docking power, the DL\_RF scoring function has a success rate when the three best-scored ligand binding poses are considered within 2 Å root-mean-square-deviation from the native pose of 36.0% versus 30.2% of the RF scoring function. For the screening power, the DL scoring function has an average enrichment factor and success rate at the top 1% level of (2.69 and 6.45%) respectively versus (1.61 and 4.84%) respectively of the RF scoring function.