Mohamed Mahmoud

CR
h-index12
28papers
807citations
Novelty41%
AI Score55

28 Papers

CVNov 15, 2022Code
Deep learning for table detection and structure recognition: A survey

Mahmoud Kasem, Abdelrahman Abdallah, Alexander Berendeyev et al.

Tables are everywhere, from scientific journals, papers, websites, and newspapers all the way to items we buy at the supermarket. Detecting them is thus of utmost importance to automatically understanding the content of a document. The performance of table detection has substantially increased thanks to the rapid development of deep learning networks. The goals of this survey are to provide a profound comprehension of the major developments in the field of Table Detection, offer insight into the different methodologies, and provide a systematic taxonomy of the different approaches. Furthermore, we provide an analysis of both classic and new applications in the field. Lastly, the datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature. Finally, we go over the architecture of utilizing various object detection and table structure recognition methods to create an effective and efficient system, as well as a set of development trends to keep up with state-of-the-art algorithms and future research. We have also set up a public GitHub repository where we will be updating the most recent publications, open data, and source code. The GitHub repository is available at https://github.com/abdoelsayed2016/table-detection-structure-recognition.

CRApr 20
RAVEN: Retrieval-Augmented Vulnerability Exploration Network for Memory Corruption Analysis in User Code and Binary Programs

Parteek Jamwal, Minghao Shao, Boyuan Chen et al.

Large Language Models (LLMs) have demonstrated remarkable capabilities across various cybersecurity tasks, including vulnerability classification, detection, and patching. However, their potential in automated vulnerability report documentation and analysis remains underexplored. We present RAVEN (Retrieval Augmented Vulnerability Exploration Network), a framework leveraging LLM agents and Retrieval Augmented Generation (RAG) to synthesize comprehensive vulnerability analysis reports. Given vulnerable source code, RAVEN generates reports following the Google Project Zero Root Cause Analysis template. The framework uses four modules: an Explorer agent for vulnerability identification, a RAG engine retrieving relevant knowledge from curated databases including Google Project Zero reports and CWE entries, an Analyst agent for impact and exploitation assessment, and a Reporter agent for structured report generation. To ensure quality, RAVEN includes a task specific LLM Judge evaluating reports across structural integrity, ground truth alignment, code reasoning quality, and remediation quality. We evaluate RAVEN on 105 vulnerable code samples covering 15 CWE types from the NIST-SARD dataset. Results show an average quality score of 54.21%, supporting the effectiveness of our approach for automated vulnerability documentation.

CLMar 26, 2024Code
ArabicaQA: A Comprehensive Dataset for Arabic Question Answering

Abdelrahman Abdallah, Mahmoud Kasem, Mahmoud Abdalla et al.

In this paper, we address the significant gap in Arabic natural language processing (NLP) resources by introducing ArabicaQA, the first large-scale dataset for machine reading comprehension and open-domain question answering in Arabic. This comprehensive dataset, consisting of 89,095 answerable and 3,701 unanswerable questions created by crowdworkers to look similar to answerable ones, along with additional labels of open-domain questions marks a crucial advancement in Arabic NLP resources. We also present AraDPR, the first dense passage retrieval model trained on the Arabic Wikipedia corpus, specifically designed to tackle the unique challenges of Arabic text retrieval. Furthermore, our study includes extensive benchmarking of large language models (LLMs) for Arabic question answering, critically evaluating their performance in the Arabic language context. In conclusion, ArabicaQA, AraDPR, and the benchmarking of LLMs in Arabic question answering offer significant advancements in the field of Arabic NLP. The dataset and code are publicly accessible for further research https://github.com/DataScienceUIBK/ArabicaQA.

CVMar 22
Representation-Level Adversarial Regularization for Clinically Aligned Multitask Thyroid Ultrasound Assessment

Dina Salama, Mohamed Mahmoud, Nourhan Bayasi et al.

Thyroid ultrasound is the first-line exam for assessing thyroid nodules and determining whether biopsy is warranted. In routine reporting, radiologists produce two coupled outputs: a nodule contour for measurement and a TI-RADS risk category based on sonographic criteria. Yet both contouring style and risk grading vary across readers, creating inconsistent supervision that can degrade standard learning pipelines. In this paper, we address this workflow with a clinically guided multitask framework that jointly predicts the nodule mask and TI-RADS category within a single model. To ground risk prediction in clinically meaningful evidence, we guide the classification embedding using a compact TI-RADS aligned radiomics target during training, while preserving complementary deep features for discriminative performance. However, under annotator variability, naive multitask optimization often fails not because the tasks are unrelated, but because their gradients compete within the shared representation. To make this competition explicit and controllable, we introduce RLAR, a representation-level adversarial gradient regularizer. Rather than performing parameter-level gradient surgery, RLAR uses each task's normalized adversarial direction in latent space as a geometric probe of task sensitivity and penalizes excessive angular alignment between task-specific adversarial directions. On a public TI-RADS dataset, our clinically guided multitask model with RLAR consistently improves risk stratification while maintaining segmentation quality compared to single-task training and conventional multitask baselines. Code and pretrained models will be released.

IRApr 8Code
MARVEL: Multimodal Adaptive Reasoning-intensiVe Expand-rerank and retrievaL

Mahmoud SalahEldin Kasem, Mohamed Mahmoud, Mostafa Farouk Senussi et al.

Multimodal retrieval over text corpora remains a fundamental challenge: the best vision-language encoder achieves only 27.6 nDCG@10 on MM-BRIGHT, a reasoning-intensive multimodal retrieval benchmark, underperforming strong text-only systems. We argue that effective multimodal retrieval requires three tightly integrated capabilities that existing approaches address only in isolation: expanding the query's latent intent, retrieving with a model trained for complex reasoning, and reranking via explicit step-by-step reasoning over candidates. We introduce \textbf{MARVEL} (\textbf{M}ultimodal \textbf{A}daptive \textbf{R}easoning-intensi\textbf{V}e \textbf{E}xpand-rerank and retrieva\textbf{L}), a unified pipeline that combines LLM-driven query expansion, \textbf{MARVEL-Retriever} -- a reasoning-enhanced dense retriever fine-tuned for complex multimodal queries -- and GPT-4o-based chain-of-thought reranking with optional multi-pass reciprocal rank fusion. Evaluated on MM-BRIGHT across 29 technical domains, MARVEL achieves \textbf{37.9} nDCG@10, surpassing the best multimodal encoder by \textbf{+10.3 points} and outperforming all single-stage baselines in 27 of 29 domains and matching or approaching the best baseline in the remaining two highly-specialized domains (Crypto, Quantum Computing), demonstrating that reasoning-intensive multimodal retrieval is best addressed through a unified expand-retrieve-rerank framework. https://github.com/mm-bright/multimodal-reasoning-retrieval

IRApr 8Code
HIVE: Query, Hypothesize, Verify An LLM Framework for Multimodal Reasoning-Intensive Retrieval

Mahmoud Abdalla, Mahmoud SalahEldin Kasem, Mohamed Mahmoud et al.

Multimodal retrieval models fail on reasoning-intensive queries where images (diagrams, charts, screenshots) must be deeply integrated with text to identify relevant documents -- the best multimodal model achieves only 27.6 nDCG@10 on MM-BRIGHT, underperforming even strong text-only retrievers (32.2). We introduce \textbf{HIVE} (\textbf{H}ypothesis-driven \textbf{I}terative \textbf{V}isual \textbf{E}vidence Retrieval), a plug-and-play framework that injects explicit visual-text reasoning into a retriever via LLMs. HIVE operates in four stages: (1) initial retrieval over the corpus, (2) LLM-based compensatory query synthesis that explicitly articulates visual and logical gaps observed in top-$k$ candidates, (3) secondary retrieval with the refined query, and (4) LLM verification and reranking over the union of candidates. Evaluated on the multimodal-to-text track of MM-BRIGHT (2,803 real-world queries across 29 technical domains), HIVE achieves a new state-of-the-art aggregated nDCG@10 of \textbf{41.7} -- a \textbf{+9.5} point gain over the best text-only model (DiVeR: 32.2) and \textbf{+14.1} over the best multimodal model (Nomic-Vision: 27.6), where our reasoning-enhanced base retriever contributes 33.2 and the HIVE framework adds a further \textbf{+8.5} points -- with particularly strong results in visually demanding domains (Gaming: 68.2, Chemistry: 42.5, Sustainability: 49.4). Compatible with both standard and reasoning-enhanced retrievers, HIVE demonstrates that LLM-mediated visual hypothesis generation and verification can substantially close the multimodal reasoning gap in retrieval. https://github.com/mm-bright/multimodal-reasoning-retrieval

IRApr 8Code
BRIDGE: Multimodal-to-Text Retrieval via Reinforcement-Learned Query Alignment

Mohamed Darwish Mounis, Mohamed Mahmoud, Shaimaa Sedek et al.

Multimodal retrieval systems struggle to resolve image-text queries against text-only corpora: the best vision-language encoder achieves only 27.6 nDCG@10 on MM-BRIGHT, underperforming strong text-only retrievers. We argue the bottleneck is not the retriever but the query -- raw multimodal queries entangle visual descriptions, conversational noise, and retrieval intent in ways that systematically degrade embedding similarity. We present \textbf{BRIDGE}, a two-component system that resolves this mismatch without multimodal encoders. \textbf{FORGE} (\textbf{F}ocused Retrieval Query Generato\textbf{r}) is a query alignment model trained via reinforcement learning, which distills noisy multimodal queries into compact, retrieval-optimized search strings. \textbf{LENS} (\textbf{L}anguage-\textbf{E}nhanced \textbf{N}eural \textbf{S}earch) is a reasoning-enhanced dense retriever fine-tuned on reasoning-intensive retrieval data to handle the intent-rich queries FORGE produces. Evaluated on MM-BRIGHT (2,803 queries, 29 domains), BRIDGE achieves \textbf{29.7} nDCG@10, surpassing all multimodal encoder baselines including Nomic-Vision (27.6). When FORGE is applied as a plug-and-play aligner on top of Nomic-Vision, the combined system reaches \textbf{33.3} nDCG@10 -- exceeding the best text-only retriever (32.2) -- demonstrating that \textit{query alignment} is the key bottleneck in multimodal-to-text retrieval. https://github.com/mm-bright/multimodal-reasoning-retrieval

CVJul 18, 2025Code
Foundation Models as Class-Incremental Learners for Dermatological Image Classification

Mohamed Elkhayat, Mohamed Mahmoud, Jamil Fayyad et al.

Class-Incremental Learning (CIL) aims to learn new classes over time without forgetting previously acquired knowledge. The emergence of foundation models (FM) pretrained on large datasets presents new opportunities for CIL by offering rich, transferable representations. However, their potential for enabling incremental learning in dermatology remains largely unexplored. In this paper, we systematically evaluate frozen FMs pretrained on large-scale skin lesion datasets for CIL in dermatological disease classification. We propose a simple yet effective approach where the backbone remains frozen, and a lightweight MLP is trained incrementally for each task. This setup achieves state-of-the-art performance without forgetting, outperforming regularization, replay, and architecture based methods. To further explore the capabilities of frozen FMs, we examine zero training scenarios using nearest mean classifiers with prototypes derived from their embeddings. Through extensive ablation studies, we demonstrate that this prototype based variant can also achieve competitive results. Our findings highlight the strength of frozen FMs for continual learning in dermatology and support their broader adoption in real world medical applications. Our code and datasets are available here.

CVJun 6, 2024Code
ReceiptSense: Beyond Traditional OCR -- A Dataset for Receipt Understanding

Abdelrahman Abdallah, Mohamed Mounis, Mahmoud Abdalla et al.

Multilingual OCR and information extraction from receipts remains challenging, particularly for complex scripts like Arabic. We introduce \dataset, a comprehensive dataset designed for Arabic-English receipt understanding comprising 20,000 annotated receipts from diverse retail settings, 30,000 OCR-annotated images, and 10,000 item-level annotations, and a new Receipt QA subset with 1265 receipt images paired with 40 question-answer pairs each to support LLM evaluation for receipt understanding. The dataset captures merchant names, item descriptions, prices, receipt numbers, and dates to support object detection, OCR, and information extraction tasks. We establish baseline performance using traditional methods (Tesseract OCR) and advanced neural networks, demonstrating the dataset's effectiveness for processing complex, noisy real-world receipt layouts. Our publicly accessible dataset advances automated multilingual document processing research (see https://github.com/Update-For-Integrated-Business-AI/CORU ).

CVDec 19, 2023
Advancements and Challenges in Arabic Optical Character Recognition: A Comprehensive Survey

Mahmoud SalahEldin Kasem, Mohamed Mahmoud, Hyun-Soo Kang

Optical character recognition (OCR) is a vital process that involves the extraction of handwritten or printed text from scanned or printed images, converting it into a format that can be understood and processed by machines. This enables further data processing activities such as searching and editing. The automatic extraction of text through OCR plays a crucial role in digitizing documents, enhancing productivity, improving accessibility, and preserving historical records. This paper seeks to offer an exhaustive review of contemporary applications, methodologies, and challenges associated with Arabic Optical Character Recognition (OCR). A thorough analysis is conducted on prevailing techniques utilized throughout the OCR process, with a dedicated effort to discern the most efficacious approaches that demonstrate enhanced outcomes. To ensure a thorough evaluation, a meticulous keyword-search methodology is adopted, encompassing a comprehensive analysis of articles relevant to Arabic OCR, including both backward and forward citation reviews. In addition to presenting cutting-edge techniques and methods, this paper critically identifies research gaps within the realm of Arabic OCR. By highlighting these gaps, we shed light on potential areas for future exploration and development, thereby guiding researchers toward promising avenues in the field of Arabic OCR. The outcomes of this study provide valuable insights for researchers, practitioners, and stakeholders involved in Arabic OCR, ultimately fostering advancements in the field and facilitating the creation of more accurate and efficient OCR systems for the Arabic language.

CVMay 9, 2024
A Comprehensive Survey of Masked Faces: Recognition, Detection, and Unmasking

Mohamed Mahmoud, Mahmoud SalahEldin Kasem, Hyun-Soo Kang

Masked face recognition (MFR) has emerged as a critical domain in biometric identification, especially by the global COVID-19 pandemic, which introduced widespread face masks. This survey paper presents a comprehensive analysis of the challenges and advancements in recognising and detecting individuals with masked faces, which has seen innovative shifts due to the necessity of adapting to new societal norms. Advanced through deep learning techniques, MFR, along with Face Mask Recognition (FMR) and Face Unmasking (FU), represent significant areas of focus. These methods address unique challenges posed by obscured facial features, from fully to partially covered faces. Our comprehensive review delves into the various deep learning-based methodologies developed for MFR, FMR, and FU, highlighting their distinctive challenges and the solutions proposed to overcome them. Additionally, we explore benchmark datasets and evaluation metrics specifically tailored for assessing performance in MFR research. The survey also discusses the substantial obstacles still facing researchers in this field and proposes future directions for the ongoing development of more robust and effective masked face recognition systems. This paper serves as an invaluable resource for researchers and practitioners, offering insights into the evolving landscape of face recognition technologies in the face of global health crises and beyond.

CRApr 6, 2021
Efficient and Privacy-Preserving Infection Control System for Covid-19-Like Pandemics using Blockchain

Seham A. Alansar, Mahmoud M. Badr, Mohamed Mahmoud et al.

Contact tracing is a very effective way to control the COVID-19-like pandemics. It aims to identify individuals who closely contacted an infected person during the incubation period of the virus and notify them to quarantine. However, the existing systems suffer from privacy, security, and efficiency issues. To address these limitations, in this paper, we propose an efficient and privacy-preserving Blockchain-based infection control system. Instead of depending on a single authority to run the system, a group of health authorities, that form a consortium Blockchain, run our system. Using Blockchain technology not only secures our system against single point of failure and denial of service attacks, but also brings transparency because all transactions can be validated by different parties. Although contact tracing is important, it is not enough to effectively control an infection. Thus, unlike most of the existing systems that focus only on contact tracing, our system consists of three integrated subsystems, including contact tracing, public places access control, and safe-places recommendation. The access control subsystem prevents infected people from visiting public places to prevent spreading the virus, and the recommendation subsystem categorizes zones based on the infection level so that people can avoid visiting contaminated zones. Our analysis demonstrates that our system is secure and preserves the privacy of the users against identification, social graph disclosure, and tracking attacks, while thwarting false reporting (or panic) attacks. Moreover, our extensive performance evaluations demonstrate the scalability of our system (which is desirable in pandemics) due to its low communication, computation, and storage overheads.

CRDec 2, 2020
Detection of False-Reading Attacks in the AMI Net-Metering System

Mahmoud M. Badr, Mohamed I. Ibrahem, Mohamed Mahmoud et al.

In smart grid, malicious customers may compromise their smart meters (SMs) to report false readings to achieve financial gains illegally. Reporting false readings not only causes hefty financial losses to the utility but may also degrade the grid performance because the reported readings are used for energy management. This paper is the first work that investigates this problem in the net-metering system, in which one SM is used to report the difference between the power consumed and the power generated. First, we prepare a benign dataset for the net-metering system by processing a real power consumption and generation dataset. Then, we propose a new set of attacks tailored for the net-metering system to create malicious dataset. After that, we analyze the data and we found time correlations between the net meter readings and correlations between the readings and relevant data obtained from trustworthy sources such as the solar irradiance and temperature. Based on the data analysis, we propose a general multi-data-source deep hybrid learning-based detector to identify the false-reading attacks. Our detector is trained on net meter readings of all customers besides data from the trustworthy sources to enhance the detector performance by learning the correlations between them. The rationale here is that although an attacker can report false readings, he cannot manipulate the solar irradiance and temperature values because they are beyond his control. Extensive experiments have been conducted, and the results indicate that our detector can identify the false-reading attacks with high detection rate and low false alarm.

CRNov 7, 2020
Privacy-Preserving and Efficient Data Collection Scheme for AMI Networks Using Deep Learning

Mohamed I. Ibrahem, Mohamed Mahmoud, Mostafa M. Fouda et al.

In advanced metering infrastructure (AMI), smart meters (SMs), which are installed at the consumer side, send fine-grained power consumption readings periodically to the electricity utility for load monitoring and energy management. Change and transmit (CAT) is an efficient approach to collect these readings, where the readings are not transmitted when there is no enough change in consumption. However, this approach causes a privacy problem that is by analyzing the transmission pattern of an SM, sensitive information on the house dwellers can be inferred. For instance, since the transmission pattern is distinguishable when dwellers are on travel, attackers may analyze the pattern to launch a presence-privacy attack (PPA) to infer whether the dwellers are absent from home. In this paper, we propose a scheme, called "STDL", for efficient collection of power consumption readings in AMI networks while preserving the consumers' privacy by sending spoofing transmissions (redundant real readings) using a deep-learning approach. We first use a clustering technique and real power consumption readings to create a dataset for transmission patterns using the CAT approach. Then, we train an attacker model using deep-learning, and our evaluations indicate that the success rate of the attacker is about 91%. Finally, we train a deep-learning-based defense model to send spoofing transmissions efficiently to thwart the PPA. Extensive evaluations are conducted, and the results indicate that our scheme can reduce the attacker's success rate, to 13.52% in case he knows the defense model and to 3.15% in case he does not know the model, while still achieving high efficiency in terms of the number of readings that should be transmitted. Our measurements indicate that the proposed scheme can reduce the number of readings that should be transmitted by about 41% compared to continuously transmitting readings.

CRMay 28, 2020
Detection of Lying Electrical Vehicles in Charging Coordination Application Using Deep Learning

Ahmed Shafee, Mostafa M. Fouda, Mohamed Mahmoud et al.

The simultaneous charging of many electric vehicles (EVs) stresses the distribution system and may cause grid instability in severe cases. The best way to avoid this problem is by charging coordination. The idea is that the EVs should report data (such as state-of-charge (SoC) of the battery) to run a mechanism to prioritize the charging requests and select the EVs that should charge during this time slot and defer other requests to future time slots. However, EVs may lie and send false data to receive high charging priority illegally. In this paper, we first study this attack to evaluate the gains of the lying EVs and how their behavior impacts the honest EVs and the performance of charging coordination mechanism. Our evaluations indicate that lying EVs have a greater chance to get charged comparing to honest EVs and they degrade the performance of the charging coordination mechanism. Then, an anomaly based detector that is using deep neural networks (DNN) is devised to identify the lying EVs. To do that, we first create an honest dataset for charging coordination application using real driving traces and information revealed by EV manufacturers, and then we also propose a number of attacks to create malicious data. We trained and evaluated two models, which are the multi-layer perceptron (MLP) and the gated recurrent unit (GRU) using this dataset and the GRU detector gives better results. Our evaluations indicate that our detector can detect lying EVs with high accuracy and low false positive rate.

CRMay 28, 2020
Efficient Privacy-Preserving Electricity Theft Detection with Dynamic Billing and Load Monitoring for AMI Networks

Mohamed I. Ibrahem, Mahmoud Nabil, Mostafa M. Fouda et al.

In advanced metering infrastructure (AMI), smart meters (SMs) are installed at the consumer side to send fine-grained power consumption readings periodically to the system operator (SO) for load monitoring, energy management, billing, etc. However, fraudulent consumers launch electricity theft cyber-attacks by reporting false readings to reduce their bills illegally. These attacks do not only cause financial losses but may also degrade the grid performance because the readings are used for grid management. To identify these attackers, the existing schemes employ machine-learning models using the consumers' fine-grained readings, which violates the consumers' privacy by revealing their lifestyle. In this paper, we propose an efficient scheme that enables the SO to detect electricity theft, compute bills, and monitor load while preserving the consumers' privacy. The idea is that SMs encrypt their readings using functional encryption, and the SO uses the ciphertexts to (i) compute the bills following dynamic pricing approach, (ii) monitor the grid load, and (iii) evaluate a machine-learning model to detect fraudulent consumers, without being able to learn the individual readings to preserve consumers' privacy. We adapted a functional encryption scheme so that the encrypted readings are aggregated for billing and load monitoring and only the aggregated value is revealed to the SO. Also, we exploited the inner-product operations on encrypted readings to evaluate a machine-learning model to detect fraudulent consumers. Real dataset is used to evaluate our scheme, and our evaluations indicate that our scheme is secure and can detect fraudulent consumers accurately with low communication and computation overhead.

LGDec 24, 2019
On Sharing Models Instead of Data using Mimic learning for Smart Health Applications

Mohamed Baza, Andrew Salazar, Mohamed Mahmoud et al.

Electronic health records (EHR) systems contain vast amounts of medical information about patients. These data can be used to train machine learning models that can predict health status, as well as to help prevent future diseases or disabilities. However, getting patients' medical data to obtain well-trained machine learning models is a challenging task. This is because sharing the patients' medical records is prohibited by law in most countries due to patients privacy concerns. In this paper, we tackle this problem by sharing the models instead of the original sensitive data by using the mimic learning approach. The idea is first to train a model on the original sensitive data, called the teacher model. Then, using this model, we can transfer its knowledge to another model, called the student model, without the need to learn the original data used in training the teacher model. The student model is then shared to the public and can be used to make accurate predictions. To assess the mimic learning approach, we have evaluated our scheme using different medical datasets. The results indicate that the student model mimics the teacher model performance in terms of prediction accuracy without the need to access to the patients' original data records.

CRJun 21, 2019
B-Ride: Ride Sharing with Privacy-preservation, Trust and Fair Payment atop Public Blockchain

Mohamed Baza, Noureddine Lasla, Mohamed Mahmoud et al.

Ride-sharing is a service that enables drivers to share their trips with other riders, contributing to appealing benefits of shared travel costs. However, the majority of existing platforms rely on a central third party, which make them subject to a single point of failure and privacy disclosure issues. Moreover, they are vulnerable to DDoS and Sybil attacks due to malicious users involvement. Besides, high fees should be paid to the service provider. In this paper, we propose a decentralized ride-sharing service based on public Blockchain, named B-Ride. Both riders and drivers can find rides match while preserving their trip data, including pick-up/drop-off location, and departure/arrival date. However, under the anonymity of the public blockchain, a malicious user may submit multiple ride requests or offers, while not committing to any of them, to discover better offer or to make the system unreliable. B-Ride solves this problem by introducing a time-locked deposit protocol for a ride-sharing by leveraging smart contract and zero-knowledge set membership proof. In a nutshell, both a driver and a rider have to show their commitment by sending a deposit to the blockchain. Later, a driver has to prove to the blockchain on the agreed departure time that he has arrived at the pick-up location. To preserve rider/driver location privacy by hiding the exact pick-up location, the proof is done using zero-knowledge set membership protocol. Moreover, to ensure a fair payment, a pay-as-you-drive methodology is introduced based on the elapsed distance of the driver and the rider. Also, we introduce a reputation-based trust model to rate drivers based on their past trips to allow riders to select them based on their history on the system. Finally, we implement B-Ride in a test net of Ethereum. The experiment results show the applicability of our protocol atop the existing real-world blockchain.

CRMay 12, 2019
Privacy-Preserving and Collusion-Resistant Charging Coordination Schemes for Smart Grid

Mohamed Baza, Marbin Pazos-Revilla, Mahmoud Nabil et al.

Energy storage units (ESUs) including EVs and home batteries enable several attractive features of the modern smart grids such as effective demand response and reduced electric bills. However, uncoordinated charging of ESUs stresses the power system. In this paper, we propose privacy-preserving and collusion-resistant charging coordination centralized and decentralized schemes for the smart grid. The centralized scheme is used in case of robust communication infrastructure that connects the ESUs to the utility, while the decentralized scheme is useful in case of infrastructure not available or costly. In the centralized scheme, each energy storage unit should acquire anonymous tokens from a charging controller (CC) to send multiple charging requests to the CC via the aggregator. CC can use the charging requests to enough data to run the charging coordination scheme, but it cannot link the data to particular ESUs or reveal any private information. Our centralized scheme uses a modified knapsack problem formulation technique to maximize the amount of power delivered to the ESUs before the charging requests expire without exceeding the available maximum charging capacity. In the decentralized scheme, several ESUs run the scheme in a distributed way with no need to aggregator or CC. One ESU is selected as a head node that should decrypt the ciphertext of the aggregated messages of the ESUs' messages and broadcast it to the community while not revealing the ESUs' individual charging demands. Then, ESUs can coordinate charging requests based on the aggregated charging demand while not exceeding the maximum charging capacity. Extensive experiments and simulations are conducted to demonstrate that our schemes are efficient and secure against various attacks, and can preserve ESU owner's privacy.

CRMay 2, 2019
Mimic Learning to Generate a Shareable Network Intrusion Detection Model

Ahmed Shafee, Mohamed Baza, Douglas A. Talbert et al.

Purveyors of malicious network attacks continue to increase the complexity and the sophistication of their techniques, and their ability to evade detection continues to improve as well. Hence, intrusion detection systems must also evolve to meet these increasingly challenging threats. Machine learning is often used to support this needed improvement. However, training a good prediction model can require a large set of labelled training data. Such datasets are difficult to obtain because privacy concerns prevent the majority of intrusion detection agencies from sharing their sensitive data. In this paper, we propose the use of mimic learning to enable the transfer of intrusion detection knowledge through a teacher model trained on private data to a student model. This student model provides a mean of publicly sharing knowledge extracted from private data without sharing the data itself. Our results confirm that the proposed scheme can produce a student intrusion detection model that mimics the teacher model without requiring access to the original dataset.

CRApr 25, 2019
A Multi-Authority Attribute-Based Signcryption Scheme with Efficient Revocation for Smart Grid Downlink Communication

Ahmad Alsharif, Ahmad Shafee, Mahmoud Nabil et al.

In this paper, we propose a multi-authority attribute-based signcryption scheme with efficient revocation for smart grid downlink communications. In the proposed scheme, grid operators and electricity vendors can send multicast messages securely to different groups of consumers which is required in different applications such as firmware update distribution and sending direct load control messages. Our scheme can ensure the confidentiality and the integrity of the multicasted messages, allows consumers to authenticate the source of the multicasted messages, achieves and non-repudiation property, and allows prompt revocation, simultaneously which are required for the smart grid downlink communications. Our security analysis demonstrates that the proposed scheme can thwart various security threats to the smart grid. Our experiments conducted on an advanced metering infrastructure (AMI) testbed confirm that the proposed scheme has low computational overhead.

CRApr 22, 2019
Privacy-Preserving Smart Parking System Using Blockchain and Private Information Retrieval

Wesam Al Amiri, Mohamed Baza, Karim Banawan et al.

Searching for available parking spaces is a major problem for drivers especially in big crowded cities, causing traffic congestion and air pollution, and wasting drivers' time. Smart parking systems are a novel solution to enable drivers to have real-time parking information for pre-booking. However, current smart parking requires drivers to disclose their private information, such as desired destinations. Moreover, the existing schemes are centralized and vulnerable to the bottleneck of the single point of failure and data breaches. In this paper, we propose a distributed privacy-preserving smart parking system using blockchain. A consortium blockchain created by different parking lot owners to ensure security, transparency, and availability is proposed to store their parking offers on the blockchain. To preserve drivers' location privacy, we adopt a private information retrieval (PIR) technique to enable drivers to retrieve parking offers from blockchain nodes privately, without revealing which parking offers are retrieved. Furthermore, a short randomizable signature is used to enable drivers to reserve available parking slots in an anonymous manner. Besides, we introduce an anonymous payment system that cannot link drivers' to specific parking locations. Finally, our performance evaluations demonstrate that the proposed scheme can preserve drivers' privacy with low communication and computation overhead.

CRApr 11, 2019
Detecting Sybil Attacks using Proofs of Work and Location in VANETs

Mohamed Baza, Mahmoud Nabil, Niclas Bewermeier et al.

In this paper, we propose a Sybil attack detection scheme using proofs of work and location. The idea is that each road side unit (RSU) issues a signed time-stamped tag as a proof for the vehicle's anonymous location. Proofs sent from multiple consecutive RSUs is used to create vehicle trajectory which is used as vehicle anonymous identity. Also, one RSU is not able to issue trajectories for vehicles, rather the contributions of several RSUs are needed. By this way, attackers need to compromise an infeasible number of RSUs to create fake trajectories. Moreover, upon receiving the proof of location from an RSU, the vehicle should solve a computational puzzle by running proof of work (PoW) algorithm. So, it should provide a valid solution (proof of work) to the next RSU before it can obtain a proof of location. Using the PoW can prevent the vehicles from creating multiple trajectories in case of low-dense RSUs. Then, during any reported event, e.g., road congestion, the event manager uses a matching technique to identify the trajectories sent from Sybil vehicles. The scheme depends on the fact that the Sybil trajectories are bounded physically to one vehicle; therefore, their trajectories should overlap. Extensive experiments and simulations demonstrate that our scheme achieves high detection rate to Sybil attacks with low false negative and acceptable communication and computation overhead.

CYNov 14, 2018
Blockchain-based Firmware Update Scheme Tailored for Autonomous Vehicles

Mohamed Baza, Mahmoud Nabil, Noureddine Lasla et al.

Recently, Autonomous Vehicles (AVs) have gained extensive attention from both academia and industry. AVs are a complex system composed of many subsystems, making them a typical target for attackers. Therefore, the firmware of the different subsystems needs to be updated to the latest version by the manufacturer to fix bugs and introduce new features, e.g., using security patches. In this paper, we propose a distributed firmware update scheme for the AVs' subsystems, leveraging blockchain and smart contract technology. A consortium blockchain made of different AVs manufacturers is used to ensure the authenticity and integrity of firmware updates. Instead of depending on centralized third parties to distribute the new updates, we enable AVs, namely distributors, to participate in the distribution process and we take advantage of their mobility to guarantee high availability and fast delivery of the updates. To incentivize AVs to distribute the updates, a reward system is established that maintains a credit reputation for each distributor account in the blockchain. A zero-knowledge proof protocol is used to exchange the update in return for a proof of distribution in a trust-less environment. Moreover, we use attribute-based encryption (ABE) scheme to ensure that only authorized AVs will be able to download and use a new update. Our analysis indicates that the additional cryptography primitives and exchanged transactions do not affect the operation of the AVs network. Also, our security analysis demonstrates that our scheme is efficient and secure against different attacks.

CRNov 5, 2018
Blockchain-based Charging Coordination Mechanism for Smart Grid Energy Storage Units

Mohamed Baza, Mahmoud Nabil, Muhammad Ismail et al.

Energy storage units (ESUs) enable several attractive features of modern smart grids such as enhanced grid resilience, effective demand response, and reduced bills. However, uncoordinated charging of ESUs stresses the power system and can lead to a blackout. On the other hand, existing charging coordination mechanisms suffer from several limitations. First, the need for a central charging coordinator (CC) presents a single point of failure that jeopardizes the effectiveness of the charging coordination. Second, a transparent charging coordination mechanism does not exist where users are not aware whether the CC is honest or not in coordination charging requests among them in a fair way. Third, existing mechanisms overlook the privacy concerns of the involved customers. To address these limitations, in this paper, we leverage the blockchain and smart contracts to build a decentralized charging coordination mechanism without the need for a centralized charging coordinator. First ESUs should use tokens for anonymously authenticate themselves to the blockchain. Then each ESU sends a charging request that contains its State-of-Charge (SoC), Time-to-complete-charge (TCC) and amount of required charging to the smart contract address on the blockchain. The smart contract will then run the charging coordination mechanism in a self-executed manner such that ESUs with the highest priorities are charged in the present time slot while charging requests of lower priority ESUs are deferred to future time slots. In this way, each ESU can make sure that charging schedules are computed correctly. Finally, we have implemented the proposed mechanism on the Ethereum test-bed blockchain, and our analysis shows that execution cost can be acceptable in terms of gas consumption while enabling decentralized charging coordination with increased transparency, reliability, and privacy preserving.

CROct 3, 2018
EPIC: Efficient Privacy-Preserving Scheme with E2E Data Integrity and Authenticity for AMI Networks

Ahmad Alsharif, Mahmoud Nabil, Samet Tonyali et al.

In Advanced Metering Infrastructure (AMI) networks, smart meters should send fine-grained power consumption readings to electric utilities to perform real-time monitoring and energy management. However, these readings can leak sensitive information about consumers' activities. Various privacy-preserving schemes for collecting fine-grained readings have been proposed for AMI networks. These schemes aggregate individual readings and send an aggregated reading to the utility, but they extensively use asymmetric-key cryptography which involves large computation/communication overhead. Furthermore, they do not address End-to-End (E2E) data integrity, authenticity, and computing electricity bills based on dynamic prices. In this paper, we propose EPIC, an efficient and privacy-preserving data collection scheme with E2E data integrity verification for AMI networks. Using efficient cryptographic operations, each meter should send a masked reading to the utility such that all the masks are canceled after aggregating all meters' masked readings, and thus the utility can only obtain an aggregated reading to preserve consumers' privacy. The utility can verify the aggregated reading integrity without accessing the individual readings to preserve privacy. It can also identify the attackers and compute electricity bills efficiently by using the fine-grained readings without violating privacy. Furthermore, EPIC can resist collusion attacks in which the utility colludes with a relay node to extract the meters' readings. A formal proof, probabilistic analysis are used to evaluate the security of EPIC, and ns-3 is used to implement EPIC and evaluate the network performance. In addition, we compare EPIC to existing data collection schemes in terms of overhead and security/privacy features.

CRSep 19, 2018
Efficient and Privacy-Preserving Ride Sharing Organization for Transferable and Non-Transferable Services

Mahmoud Nabil, Ahmed Sherif, Mohamed Mahmoud et al.

Ride-sharing allows multiple persons to share their trips together in one vehicle instead of using multiple vehicles. This can reduce the number of vehicles in the street, which consequently can reduce air pollution, traffic congestion and transportation cost. However, a ride-sharing organization requires passengers to report sensitive location information about their trips to a trip organizing server (TOS) which creates a serious privacy issue. In addition, existing ride-sharing schemes are non-flexible, i.e., they require a driver and a rider to have exactly the same trip to share a ride. Moreover, they are non-scalable, i.e., inefficient if applied to large geographic areas. In this paper, we propose two efficient privacy-preserving ride-sharing organization schemes for Non-transferable Ride-sharing Services (NRS) and Transferable Ride-sharing Services (TRS). In the NRS scheme, a rider can share a ride from its source to destination with only one driver whereas, in TRS scheme, a rider can transfer between multiple drivers while en route until he reaches his destination. In both schemes, the ride-sharing area is divided into a number of small geographic areas, called cells, and each cell has a unique identifier. Each driver/rider should encrypt his trip's data and send an encrypted ride-sharing offer/request to the TOS. In NRS scheme, Bloom filters are used to compactly represent the trip information before encryption. Then, the TOS can measure the similarity between the encrypted trips data to organize shared rides without revealing either the users' identities or the location information. In TRS scheme, drivers report their encrypted routes, an then the TOS builds an encrypted directed graph that is passed to a modified version of Dijkstra's shortest path algorithm to search for an optimal path of rides that can achieve a set of preferences defined by the riders.

LGSep 6, 2018
Deep Recurrent Electricity Theft Detection in AMI Networks with Random Tuning of Hyper-parameters

Mahmoud Nabil, Muhammad Ismail, Mohamed Mahmoud et al.

Modern smart grids rely on advanced metering infrastructure (AMI) networks for monitoring and billing purposes. However, such an approach suffers from electricity theft cyberattacks. Different from the existing research that utilizes shallow, static, and customer-specific-based electricity theft detectors, this paper proposes a generalized deep recurrent neural network (RNN)-based electricity theft detector that can effectively thwart these cyberattacks. The proposed model exploits the time series nature of the customers' electricity consumption to implement a gated recurrent unit (GRU)-RNN, hence, improving the detection performance. In addition, the proposed RNN-based detector adopts a random search analysis in its learning stage to appropriately fine-tune its hyper-parameters. Extensive test studies are carried out to investigate the detector's performance using publicly available real data of 107,200 energy consumption days from 200 customers. Simulation results demonstrate the superior performance of the proposed detector compared with state-of-the-art electricity theft detectors.