Anis Koubâa

h-index59

43papers

1,141citations

Novelty31%

AI Score45

Ranked #42,136 of 194,257 authors (top 22%)#14,803 in CV (top 25%)

43 Papers

5.0CVMar 1, 2023Code

TAU: A Framework for Video-Based Traffic Analytics Leveraging Artificial Intelligence and Unmanned Aerial Systems

Bilel Benjdira, Anis Koubaa, Ahmad Taher Azar et al.

Smart traffic engineering and intelligent transportation services are in increasing demand from governmental authorities to optimize traffic performance and thus reduce energy costs, increase the drivers' safety and comfort, ensure traffic laws enforcement, and detect traffic violations. In this paper, we address this challenge, and we leverage the use of Artificial Intelligence (AI) and Unmanned Aerial Vehicles (UAVs) to develop an AI-integrated video analytics framework, called TAU (Traffic Analysis from UAVs), for automated traffic analytics and understanding. Unlike previous works on traffic video analytics, we propose an automated object detection and tracking pipeline from video processing to advanced traffic understanding using high-resolution UAV images. TAU combines six main contributions. First, it proposes a pre-processing algorithm to adapt the high-resolution UAV image as input to the object detector without lowering the resolution. This ensures an excellent detection accuracy from high-quality features, particularly the small size of detected objects from UAV images. Second, it introduces an algorithm for recalibrating the vehicle coordinates to ensure that vehicles are uniquely identified and tracked across the multiple crops of the same frame. Third, it presents a speed calculation algorithm based on accumulating information from successive frames. Fourth, TAU counts the number of vehicles per traffic zone based on the Ray Tracing algorithm. Fifth, TAU has a fully independent algorithm for crossroad arbitration based on the data gathered from the different zones surrounding it. Sixth, TAU introduces a set of algorithms for extracting twenty-four types of insights from the raw data collected. The code is shared here: https://github.com/bilel-bj/TAU. Video demonstrations are provided here: https://youtu.be/wXJV0H7LviU and here: https://youtu.be/kGv0gmtVEbI.

7.4ROAug 22, 2023Code

ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts

Bilel Benjdira, Anis Koubaa, Anas M. Ali

In this paper, we argue that the next generation of robots can be commanded using only Language Models' prompts. Every prompt interrogates separately a specific Robotic Modality via its Modality Language Model (MLM). A central Task Modality mediates the whole communication to execute the robotic mission via a Large Language Model (LLM). This paper gives this new robotic design pattern the name of: Prompting Robotic Modalities (PRM). Moreover, this paper applies this PRM design pattern in building a new robotic framework named ROSGPT_Vision. ROSGPT_Vision allows the execution of a robotic task using only two prompts: a Visual and an LLM prompt. The Visual Prompt extracts, in natural language, the visual semantic features related to the task under consideration (Visual Robotic Modality). Meanwhile, the LLM Prompt regulates the robotic reaction to the visual description (Task Modality). The framework automates all the mechanisms behind these two prompts. The framework enables the robot to address complex real-world scenarios by processing visual data, making informed decisions, and carrying out actions automatically. The framework comprises one generic vision module and two independent ROS nodes. As a test application, we used ROSGPT_Vision to develop CarMate, which monitors the driver's distraction on the roads and makes real-time vocal notifications to the driver. We showed how ROSGPT_Vision significantly reduced the development cost compared to traditional methods. We demonstrated how to improve the quality of the application by optimizing the prompting strategies, without delving into technical details. ROSGPT_Vision is shared with the community (link: https://github.com/bilel-bj/ROSGPT_Vision) to advance robotic research in this direction and to build more robotic frameworks that implement the PRM design pattern and enables controlling robots using only prompts.

1.5CVAug 30, 2023

Early Detection of Red Palm Weevil Infestations using Deep Learning Classification of Acoustic Signals

Wadii Boulila, Ayyub Alzahem, Anis Koubaa et al.

The Red Palm Weevil (RPW), also known as the palm weevil, is considered among the world's most damaging insect pests of palms. Current detection techniques include the detection of symptoms of RPW using visual or sound inspection and chemical detection of volatile signatures generated by infested palm trees. However, efficient detection of RPW diseases at an early stage is considered one of the most challenging issues for cultivating date palms. In this paper, an efficient approach to the early detection of RPW is proposed. The proposed approach is based on RPW sound activities being recorded and analyzed. The first step involves the conversion of sound data into images based on a selected set of features. The second step involves the combination of images from the same sound file but computed by different features into a single image. The third step involves the application of different Deep Learning (DL) techniques to classify resulting images into two classes: infested and not infested. Experimental results show good performances of the proposed approach for RPW detection using different DL techniques, namely MobileNetV2, ResNet50V2, ResNet152V2, VGG16, VGG19, DenseNet121, DenseNet201, Xception, and InceptionV3. The proposed approach outperformed existing techniques for public datasets.

2.9CLOct 16, 2023

Prediction of Arabic Legal Rulings using Large Language Models

Adel Ammar, Anis Koubaa, Bilel Benjdira et al.

In the intricate field of legal studies, the analysis of court decisions is a cornerstone for the effective functioning of the judicial system. The ability to predict court outcomes helps judges during the decision-making process and equips lawyers with invaluable insights, enhancing their strategic approaches to cases. Despite its significance, the domain of Arabic court analysis remains under-explored. This paper pioneers a comprehensive predictive analysis of Arabic court decisions on a dataset of 10,813 commercial court real cases, leveraging the advanced capabilities of the current state-of-the-art large language models. Through a systematic exploration, we evaluate three prevalent foundational models (LLaMA-7b, JAIS-13b, and GPT3.5-turbo) and three training paradigms: zero-shot, one-shot, and tailored fine-tuning. Besides, we assess the benefit of summarizing and/or translating the original Arabic input texts. This leads to a spectrum of 14 model variants, for which we offer a granular performance assessment with a series of different metrics (human assessment, GPT evaluation, ROUGE, and BLEU scores). We show that all variants of LLaMA models yield limited performance, whereas GPT-3.5-based models outperform all other models by a wide margin, surpassing the average score of the dedicated Arabic-centric JAIS model by 50%. Furthermore, we show that all scores except human evaluation are inconsistent and unreliable for assessing the performance of large language models on court decision predictions. This study paves the way for future research, bridging the gap between computational linguistics and Arabic legal analytics.

6.6IVMar 15, 2022

Securing the Classification of COVID-19 in Chest X-ray Images: A Privacy-Preserving Deep Learning Approach

Wadii Boulila, Adel Ammar, Bilel Benjdira et al.

Deep learning (DL) is being increasingly utilized in healthcare-related fields due to its outstanding efficiency. However, we have to keep the individual health data used by DL models private and secure. Protecting data and preserving the privacy of individuals has become an increasingly prevalent issue. The gap between the DL and privacy communities must be bridged. In this paper, we propose privacy-preserving deep learning (PPDL)-based approach to secure the classification of Chest X-ray images. This study aims to use Chest X-ray images to their fullest potential without compromising the privacy of the data that it contains. The proposed approach is based on two steps: encrypting the dataset using partially homomorphic encryption and training/testing the DL algorithm over the encrypted images. Experimental results on the COVID-19 Radiography database show that the MobileNetV2 model achieves an accuracy of 94.2% over the plain data and 93.3% over the encrypted data.

3.9CVSep 21, 2023

License Plate Super-Resolution Using Diffusion Models

Sawsan AlHalawani, Bilel Benjdira, Adel Ammar et al.

In surveillance, accurately recognizing license plates is hindered by their often low quality and small dimensions, compromising recognition precision. Despite advancements in AI-based image super-resolution, methods like Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs) still fall short in enhancing license plate images. This study leverages the cutting-edge diffusion model, which has consistently outperformed other deep learning techniques in image restoration. By training this model using a curated dataset of Saudi license plates, both in low and high resolutions, we discovered the diffusion model's superior efficacy. The method achieves a 12.55\% and 37.32% improvement in Peak Signal-to-Noise Ratio (PSNR) over SwinIR and ESRGAN, respectively. Moreover, our method surpasses these techniques in terms of Structural Similarity Index (SSIM), registering a 4.89% and 17.66% improvement over SwinIR and ESRGAN, respectively. Furthermore, 92% of human evaluators preferred our images over those from other algorithms. In essence, this research presents a pioneering solution for license plate super-resolution, with tangible potential for surveillance systems.

1.5CVApr 26, 2023

Streamlined Global and Local Features Combinator (SGLC) for High Resolution Image Dehazing

Bilel Benjdira, Anas M. Ali, Anis Koubaa

Image Dehazing aims to remove atmospheric fog or haze from an image. Although the Dehazing models have evolved a lot in recent years, few have precisely tackled the problem of High-Resolution hazy images. For this kind of image, the model needs to work on a downscaled version of the image or on cropped patches from it. In both cases, the accuracy will drop. This is primarily due to the inherent failure to combine global and local features when the image size increases. The Dehazing model requires global features to understand the general scene peculiarities and the local features to work better with fine and pixel details. In this study, we propose the Streamlined Global and Local Features Combinator (SGLC) to solve these issues and to optimize the application of any Dehazing model to High-Resolution images. The SGLC contains two successive blocks. The first is the Global Features Generator (GFG) which generates the first version of the Dehazed image containing strong global features. The second block is the Local Features Enhancer (LFE) which improves the local feature details inside the previously generated image. When tested on the Uformer architecture for Dehazing, SGLC increased the PSNR metric by a significant margin. Any other model can be incorporated inside the SGLC process to improve its efficiency on High-Resolution input data.

2.0LGApr 18, 2023

Contactless Human Activity Recognition using Deep Learning with Flexible and Scalable Software Define Radio

Muhammad Zakir Khan, Jawad Ahmad, Wadii Boulila et al.

Ambient computing is gaining popularity as a major technological advancement for the future. The modern era has witnessed a surge in the advancement in healthcare systems, with viable radio frequency solutions proposed for remote and unobtrusive human activity recognition (HAR). Specifically, this study investigates the use of Wi-Fi channel state information (CSI) as a novel method of ambient sensing that can be employed as a contactless means of recognizing human activity in indoor environments. These methods avoid additional costly hardware required for vision-based systems, which are privacy-intrusive, by (re)using Wi-Fi CSI for various safety and security applications. During an experiment utilizing universal software-defined radio (USRP) to collect CSI samples, it was observed that a subject engaged in six distinct activities, which included no activity, standing, sitting, and leaning forward, across different areas of the room. Additionally, more CSI samples were collected when the subject walked in two different directions. This study presents a Wi-Fi CSI-based HAR system that assesses and contrasts deep learning approaches, namely convolutional neural network (CNN), long short-term memory (LSTM), and hybrid (LSTM+CNN), employed for accurate activity recognition. The experimental results indicate that LSTM surpasses current models and achieves an average accuracy of 95.3% in multi-activity classification when compared to CNN and hybrid techniques. In the future, research needs to study the significance of resilience in diverse and dynamic environments to identify the activity of multiple users.

1.5CVJun 29, 2023

Sustainable Palm Tree Farming: Leveraging IoT and Multi-Modal Data for Early Detection and Mapping of Red Palm Weevil

Yosra Hajjaji, Ayyub Alzahem, Wadii Boulila et al.

The Red Palm Weevil (RPW) is a highly destructive insect causing economic losses and impacting palm tree farming worldwide. This paper proposes an innovative approach for sustainable palm tree farming by utilizing advanced technologies for the early detection and management of RPW. Our approach combines computer vision, deep learning (DL), the Internet of Things (IoT), and geospatial data to detect and classify RPW-infested palm trees effectively. The main phases include; (1) DL classification using sound data from IoT devices, (2) palm tree detection using YOLOv8 on UAV images, and (3) RPW mapping using geospatial data. Our custom DL model achieves 100% precision and recall in detecting and localizing infested palm trees. Integrating geospatial data enables the creation of a comprehensive RPW distribution map for efficient monitoring and targeted management strategies. This technology-driven approach benefits agricultural authorities, farmers, and researchers in managing RPW infestations and safeguarding palm tree plantations' productivity.

2.6CVMar 15, 2022

Parking Analytics Framework using Deep Learning

Bilel Benjdira, Anis Koubaa, Wadii Boulila et al.

With the number of vehicles continuously increasing, parking monitoring and analysis are becoming a substantial feature of modern cities. In this study, we present a methodology to monitor car parking areas and to analyze their occupancy in real-time. The solution is based on a combination between image analysis and deep learning techniques. It incorporates four building blocks put inside a pipeline: vehicle detection, vehicle tracking, manual annotation of parking slots, and occupancy estimation using the Ray Tracing algorithm. The aim of this methodology is to optimize the use of parking areas and to reduce the time wasted by daily drivers to find the right parking slot for their cars. Also, it helps to better manage the space of the parking areas and to discover misuse cases. A demonstration of the provided solution is shown in the following video link: https://www.youtube.com/watch?v=KbAt8zT14Tc.

3.9CVSep 27, 2023

Guided Frequency Loss for Image Restoration

Bilel Benjdira, Anas M. Ali, Anis Koubaa

Image Restoration has seen remarkable progress in recent years. Many generative models have been adapted to tackle the known restoration cases of images. However, the interest in benefiting from the frequency domain is not well explored despite its major factor in these particular cases of image synthesis. In this study, we propose the Guided Frequency Loss (GFL), which helps the model to learn in a balanced way the image's frequency content alongside the spatial content. It aggregates three major components that work in parallel to enhance learning efficiency; a Charbonnier component, a Laplacian Pyramid component, and a Gradual Frequency component. We tested GFL on the Super Resolution and the Denoising tasks. We used three different datasets and three different architectures for each of them. We found that the GFL loss improved the PSNR metric in most implemented experiments. Also, it improved the training of the Super Resolution models in both SwinIR and SRGAN. In addition, the utility of the GFL loss increased better on constrained data due to the less stochasticity in the high frequencies' components among samples.

2.2ROJul 14, 2022

Covy: An AI-powered Robot with a Compound Vision System for Detecting Breaches in Social Distancing

Serge Saaybi, Amjad Yousef Majid, R Venkatesha Prasad et al.

This paper introduces a compound vision system that enables robots to localize people up to 15m away using a cheap camera. And, it proposes a robust navigation stack that combines Deep Reinforcement Learning (DRL) and a probabilistic localization method. To test the efficacy of these systems, we prototyped a low-cost mobile robot that we call Covy. Covy can be used for applications such as promoting social distancing during pandemics or estimating the density of a crowd. We evaluated Covy's performance through extensive sets of experiments both in simulated and realistic environments. Our results show that Covy's compound vision algorithm doubles the range of the used depth camera, and its hybrid navigation stack is more robust than a pure DRL-based one.

5.5CLJul 30, 2024

Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning

Omer Nacar, Anis Koubaa

This work presents a novel framework for training Arabic nested embedding models through Matryoshka Embedding Learning, leveraging multilingual, Arabic-specific, and English-based models, to highlight the power of nested embeddings models in various Arabic NLP downstream tasks. Our innovative contribution includes the translation of various sentence similarity datasets into Arabic, enabling a comprehensive evaluation framework to compare these models across different dimensions. We trained several nested embedding models on the Arabic Natural Language Inference triplet dataset and assessed their performance using multiple evaluation metrics, including Pearson and Spearman correlations for cosine similarity, Manhattan distance, Euclidean distance, and dot product similarity. The results demonstrate the superior performance of the Matryoshka embedding models, particularly in capturing semantic nuances unique to the Arabic language. Results demonstrated that Arabic Matryoshka embedding models have superior performance in capturing semantic nuances unique to the Arabic language, significantly outperforming traditional models by up to 20-25\% across various similarity metrics. These results underscore the effectiveness of language-specific training and highlight the potential of Matryoshka models in enhancing semantic textual similarity tasks for Arabic NLP.

18.8CLFeb 28, 2025Code

Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs

Fakhraddin Alwajih, Abdellah El Mekki, Samar Mohamed Magdy et al.

As large language models (LLMs) become increasingly integrated into daily life, ensuring their cultural sensitivity and inclusivity is paramount. We introduce our dataset, a year-long community-driven project covering all 22 Arab countries. The dataset includes instructions (input, response pairs) in both Modern Standard Arabic (MSA) and dialectal Arabic (DA), spanning 20 diverse topics. Built by a team of 44 researchers across the Arab world, all of whom are authors of this paper, our dataset offers a broad, inclusive perspective. We use our dataset to evaluate the cultural and dialectal capabilities of several frontier LLMs, revealing notable limitations. For instance, while closed-source LLMs generally exhibit strong performance, they are not without flaws, and smaller open-source models face greater challenges. Moreover, certain countries (e.g., Egypt, the UAE) appear better represented than others (e.g., Iraq, Mauritania, Yemen). Our annotation guidelines, code, and data for reproducibility are publicly available.

4.1ROOct 24, 2024Code

VECTOR: Velocity-Enhanced GRU Neural Network for Real-Time 3D UAV Trajectory Prediction

Omer Nacar, Mohamed Abdelkader, Lahouari Ghouti et al.

This paper tackles the challenge of real-time 3D trajectory prediction for UAVs, which is critical for applications such as aerial surveillance and defense. Existing prediction models that rely primarily on position data struggle with accuracy, especially when UAV movements fall outside the position domain used in training. Our research identifies a gap in utilizing velocity estimates, first-order dynamics, to better capture the dynamics and enhance prediction accuracy and generalizability in any position domain. To bridge this gap, we propose a new trajectory prediction method using Gated Recurrent Units (GRUs) within sequence-based neural networks. Unlike traditional methods that rely on RNNs or transformers, this approach forecasts future velocities and positions based on historical velocity data instead of positions. This is designed to enhance prediction accuracy and scalability, overcoming challenges faced by conventional models in handling complex UAV dynamics. The methodology employs both synthetic and real-world 3D UAV trajectory data, capturing a wide range of flight patterns, speeds, and agility. Synthetic data is generated using the Gazebo simulator and PX4 Autopilot, while real-world data comes from the UZH-FPV and Mid-Air drone racing datasets. The GRU-based models significantly outperform state-of-the-art RNN approaches, with a mean square error (MSE) as low as 2 x 10^-8. Overall, our findings confirm the effectiveness of incorporating velocity data in improving the accuracy of UAV trajectory predictions across both synthetic and real-world scenarios, in and out of position data distributions. Finally, we open-source our 5000 trajectories dataset and a ROS 2 package to facilitate the integration with existing ROS-based UAV systems.

5.2CVJun 1, 2024Code

An Effective Weight Initialization Method for Deep Learning: Application to Satellite Image Classification

Wadii Boulila, Eman Alshanqiti, Ayyub Alzahem et al.

The growing interest in satellite imagery has triggered the need for efficient mechanisms to extract valuable information from these vast data sources, providing deeper insights. Even though deep learning has shown significant progress in satellite image classification. Nevertheless, in the literature, only a few results can be found on weight initialization techniques. These techniques traditionally involve initializing the networks' weights before training on extensive datasets, distinct from fine-tuning the weights of pre-trained networks. In this study, a novel weight initialization method is proposed in the context of satellite image classification. The proposed weight initialization method is mathematically detailed during the forward and backward passes of the convolutional neural network (CNN) model. Extensive experiments are carried out using six real-world datasets. Comparative analyses with existing weight initialization techniques made on various well-known CNN models reveal that the proposed weight initialization technique outperforms the previous competitive techniques in classification accuracy. The complete code of the proposed technique, along with the obtained results, is available at https://github.com/WadiiBoulila/Weight-Initialization

4.8CLFeb 23, 2024

ArabianGPT: Native Arabic GPT-based Large Language Model

Anis Koubaa, Adel Ammar, Lahouari Ghouti et al.

The predominance of English and Latin-based large language models (LLMs) has led to a notable deficit in native Arabic LLMs. This discrepancy is accentuated by the prevalent inclusion of English tokens in existing Arabic models, detracting from their efficacy in processing native Arabic's intricate morphology and syntax. Consequently, there is a theoretical and practical imperative for developing LLMs predominantly focused on Arabic linguistic elements. To address this gap, this paper proposes ArabianGPT, a series of transformer-based models within the ArabianLLM suite designed explicitly for Arabic. These models, including ArabianGPT-0.1B and ArabianGPT-0.3B, vary in size and complexity, aligning with the nuanced linguistic characteristics of Arabic. The AraNizer tokenizer, integral to these models, addresses the unique morphological aspects of Arabic script, ensuring more accurate text processing. Empirical results from fine-tuning the models on tasks like sentiment analysis and summarization demonstrate significant improvements. For sentiment analysis, the fine-tuned ArabianGPT-0.1B model achieved a remarkable accuracy of 95%, a substantial increase from the base model's 56%. Similarly, in summarization tasks, fine-tuned models showed enhanced F1 scores, indicating improved precision and recall in generating concise summaries. Comparative analysis of fine-tuned ArabianGPT models against their base versions across various benchmarks reveals nuanced differences in performance, with fine-tuning positively impacting specific tasks like question answering and summarization. These findings underscore the efficacy of fine-tuning in aligning ArabianGPT models more closely with specific NLP tasks, highlighting the potential of tailored transformer architectures in advancing Arabic NLP.

9.6CLOct 24, 2025

REMONI: An Autonomous System Integrating Wearables and Multimodal Large Language Models for Enhanced Remote Health Monitoring

Thanh Cong Ho, Farah Kharrat, Abderrazek Abid et al.

With the widespread adoption of wearable devices in our daily lives, the demand and appeal for remote patient monitoring have significantly increased. Most research in this field has concentrated on collecting sensor data, visualizing it, and analyzing it to detect anomalies in specific diseases such as diabetes, heart disease and depression. However, this domain has a notable gap in the aspect of human-machine interaction. This paper proposes REMONI, an autonomous REmote health MONItoring system that integrates multimodal large language models (MLLMs), the Internet of Things (IoT), and wearable devices. The system automatically and continuously collects vital signs, accelerometer data from a special wearable (such as a smartwatch), and visual data in patient video clips collected from cameras. This data is processed by an anomaly detection module, which includes a fall detection model and algorithms to identify and alert caregivers of the patient's emergency conditions. A distinctive feature of our proposed system is the natural language processing component, developed with MLLMs capable of detecting and recognizing a patient's activity and emotion while responding to healthcare worker's inquiries. Additionally, prompt engineering is employed to integrate all patient information seamlessly. As a result, doctors and nurses can access real-time vital signs and the patient's current state and mood by interacting with an intelligent agent through a user-friendly web application. Our experiments demonstrate that our system is implementable and scalable for real-life scenarios, potentially reducing the workload of medical professionals and healthcare costs. A full-fledged prototype illustrating the functionalities of the system has been developed and being tested to demonstrate the robustness of its various capabilities.

3.6CVOct 22, 2025

Enhancing Early Alzheimer Disease Detection through Big Data and Ensemble Few-Shot Learning

Safa Ben Atitallah, Maha Driss, Wadii Boulila et al.

Alzheimer disease is a severe brain disorder that causes harm in various brain areas and leads to memory damage. The limited availability of labeled medical data poses a significant challenge for accurate Alzheimer disease detection. There is a critical need for effective methods to improve the accuracy of Alzheimer disease detection, considering the scarcity of labeled data, the complexity of the disease, and the constraints related to data privacy. To address this challenge, our study leverages the power of big data in the form of pre-trained Convolutional Neural Networks (CNNs) within the framework of Few-Shot Learning (FSL) and ensemble learning. We propose an ensemble approach based on a Prototypical Network (ProtoNet), a powerful method in FSL, integrating various pre-trained CNNs as encoders. This integration enhances the richness of features extracted from medical images. Our approach also includes a combination of class-aware loss and entropy loss to ensure a more precise classification of Alzheimer disease progression levels. The effectiveness of our method was evaluated using two datasets, the Kaggle Alzheimer dataset and the ADNI dataset, achieving an accuracy of 99.72% and 99.86%, respectively. The comparison of our results with relevant state-of-the-art studies demonstrated that our approach achieved superior accuracy and highlighted its validity and potential for real-world applications in early Alzheimer disease detection.

2.0CVMay 23, 2024

Feature Fusion for Improved Classification: Combining Dempster-Shafer Theory and Multiple CNN Architectures

Ayyub Alzahem, Wadii Boulila, Maha Driss et al.

Addressing uncertainty in Deep Learning (DL) is essential, as it enables the development of models that can make reliable predictions and informed decisions in complex, real-world environments where data may be incomplete or ambiguous. This paper introduces a novel algorithm leveraging Dempster-Shafer Theory (DST) to integrate multiple pre-trained models to form an ensemble capable of providing more reliable and enhanced classifications. The main steps of the proposed method include feature extraction, mass function calculation, fusion, and expected utility calculation. Several experiments have been conducted on CIFAR-10 and CIFAR-100 datasets, demonstrating superior classification accuracy of the proposed DST-based method, achieving improvements of 5.4% and 8.4%, respectively, compared to the best individual pre-trained models. Results highlight the potential of DST as a robust framework for managing uncertainties related to data when applying DL in real-world scenarios.

2.6LGNov 28, 2024

Self-Supervised Learning for Graph-Structured Data in Healthcare Applications: A Comprehensive Review

Safa Ben Atitallah, Chaima Ben Rabah, Maha Driss et al.

The abundance of complex and interconnected healthcare data offers numerous opportunities to improve prediction, diagnosis, and treatment. Graph-structured data, which includes entities and their relationships, is well-suited for capturing complex connections. Effectively utilizing this data often requires strong and efficient learning algorithms, especially when dealing with limited labeled data. It is increasingly important for downstream tasks in various domains to utilize self-supervised learning (SSL) as a paradigm for learning and optimizing effective representations from unlabeled data. In this paper, we thoroughly review SSL approaches specifically designed for graph-structured data in healthcare applications. We explore the challenges and opportunities associated with healthcare data and assess the effectiveness of SSL techniques in real-world healthcare applications. Our discussion encompasses various healthcare settings, such as disease prediction, medical image analysis, and drug discovery. We critically evaluate the performance of different SSL methods across these tasks, highlighting their strengths, limitations, and potential future research directions. Ultimately, this review aims to be a valuable resource for both researchers and practitioners looking to utilize SSL for graph-structured data in healthcare, paving the way for improved outcomes and insights in this critical field. To the best of our knowledge, this work represents the first comprehensive review of the literature on SSL applied to graph data in healthcare.

8.3CLMay 30, 2025

GATE: General Arabic Text Embedding for Enhanced Semantic Textual Similarity with Matryoshka Representation Learning and Hybrid Loss Training

Omer Nacar, Anis Koubaa, Serry Sibaee et al.

Semantic textual similarity (STS) is a critical task in natural language processing (NLP), enabling applications in retrieval, clustering, and understanding semantic relationships between texts. However, research in this area for the Arabic language remains limited due to the lack of high-quality datasets and pre-trained models. This scarcity of resources has restricted the accurate evaluation and advance of semantic similarity in Arabic text. This paper introduces General Arabic Text Embedding (GATE) models that achieve state-of-the-art performance on the Semantic Textual Similarity task within the MTEB benchmark. GATE leverages Matryoshka Representation Learning and a hybrid loss training approach with Arabic triplet datasets for Natural Language Inference, which are essential for enhancing model performance in tasks that demand fine-grained semantic understanding. GATE outperforms larger models, including OpenAI, with a 20-25% performance improvement on STS benchmarks, effectively capturing the unique semantic nuances of Arabic.

4.6LGDec 24, 2024

Exploring Graph Mamba: A Comprehensive Survey on State-Space Models for Graph Learning

Safa Ben Atitallah, Chaima Ben Rabah, Maha Driss et al.

Graph Mamba, a powerful graph embedding technique, has emerged as a cornerstone in various domains, including bioinformatics, social networks, and recommendation systems. This survey represents the first comprehensive study devoted to Graph Mamba, to address the critical gaps in understanding its applications, challenges, and future potential. We start by offering a detailed explanation of the original Graph Mamba architecture, highlighting its key components and underlying mechanisms. Subsequently, we explore the most recent modifications and enhancements proposed to improve its performance and applicability. To demonstrate the versatility of Graph Mamba, we examine its applications across diverse domains. A comparative analysis of Graph Mamba and its variants is conducted to shed light on their unique characteristics and potential use cases. Furthermore, we identify potential areas where Graph Mamba can be applied in the future, highlighting its potential to revolutionize data analysis in these fields. Finally, we address the current limitations and open research questions associated with Graph Mamba. By acknowledging these challenges, we aim to stimulate further research and development in this promising area. This survey serves as a valuable resource for both newcomers and experienced researchers seeking to understand and leverage the power of Graph Mamba.

2.6LGDec 17, 2024

Enhancing Internet of Things Security throughSelf-Supervised Graph Neural Networks

Safa Ben Atitallah, Maha Driss, Wadii Boulila et al.

With the rapid rise of the Internet of Things (IoT), ensuring the security of IoT devices has become essential. One of the primary challenges in this field is that new types of attacks often have significantly fewer samples than more common attacks, leading to unbalanced datasets. Existing research on detecting intrusions in these unbalanced labeled datasets primarily employs Convolutional Neural Networks (CNNs) or conventional Machine Learning (ML) models, which result in incomplete detection, especially for new attacks. To handle these challenges, we suggest a new approach to IoT intrusion detection using Self-Supervised Learning (SSL) with a Markov Graph Convolutional Network (MarkovGCN). Graph learning excels at modeling complex relationships within data, while SSL mitigates the issue of limited labeled data for emerging attacks. Our approach leverages the inherent structure of IoT networks to pre-train a GCN, which is then fine-tuned for the intrusion detection task. The integration of Markov chains in GCN uncovers network structures and enriches node and edge features with contextual information. Experimental results demonstrate that our approach significantly improves detection accuracy and robustness compared to conventional supervised learning methods. Using the EdgeIIoT-set dataset, we attained an accuracy of 98.68\%, a precision of 98.18%, a recall of 98.35%, and an F1-Score of 98.40%.

9.6AISep 14, 2025

Agentic UAVs: LLM-Driven Autonomy with Integrated Tool-Calling and Cognitive Reasoning

Anis Koubaa, Khaled Gabr

Unmanned Aerial Vehicles (UAVs) are increasingly deployed in defense, surveillance, and disaster response, yet most systems remain confined to SAE Level 2--3 autonomy. Their reliance on rule-based control and narrow AI restricts adaptability in dynamic, uncertain missions. Existing UAV frameworks lack context-aware reasoning, autonomous decision-making, and ecosystem-level integration; critically, none leverage Large Language Model (LLM) agents with tool-calling for real-time knowledge access. This paper introduces the Agentic UAVs framework, a five-layer architecture (Perception, Reasoning, Action, Integration, Learning) that augments UAVs with LLM-driven reasoning, database querying, and third-party system interaction. A ROS2 and Gazebo-based prototype integrates YOLOv11 object detection with GPT-4 reasoning and local Gemma-3 deployment. In simulated search-and-rescue scenarios, agentic UAVs achieved higher detection confidence (0.79 vs. 0.72), improved person detection rates (91% vs. 75%), and markedly increased action recommendation (92% vs. 4.5%). These results confirm that modest computational overhead enables qualitatively new levels of autonomy and ecosystem integration.

3.6CVAug 28, 2025

Dual-Stage Global and Local Feature Framework for Image Dehazing

Anas M. Ali, Anis Koubaa, Bilel Benjdira

Addressing the challenge of removing atmospheric fog or haze from digital images, known as image dehazing, has recently gained significant traction in the computer vision community. Although contemporary dehazing models have demonstrated promising performance, few have thoroughly investigated high-resolution imagery. In such scenarios, practitioners often resort to downsampling the input image or processing it in smaller patches, which leads to a notable performance degradation. This drop is primarily linked to the difficulty of effectively combining global contextual information with localized, fine-grained details as the spatial resolution grows. In this chapter, we propose a novel framework, termed the Streamlined Global and Local Features Combinator (SGLC), to bridge this gap and enable robust dehazing for high-resolution inputs. Our approach is composed of two principal components: the Global Features Generator (GFG) and the Local Features Enhancer (LFE). The GFG produces an initial dehazed output by focusing on broad contextual understanding of the scene. Subsequently, the LFE refines this preliminary output by enhancing localized details and pixel-level features, thereby capturing the interplay between global appearance and local structure. To evaluate the effectiveness of SGLC, we integrated it with the Uformer architecture, a state-of-the-art dehazing model. Experimental results on high-resolution datasets reveal a considerable improvement in peak signal-to-noise ratio (PSNR) when employing SGLC, indicating its potency in addressing haze in large-scale imagery. Moreover, the SGLC design is model-agnostic, allowing any dehazing network to be augmented with the proposed global-and-local feature fusion mechanism. Through this strategy, practitioners can harness both scene-level cues and granular details, significantly improving visual fidelity in high-resolution environments.

2.3CRJun 4, 2024

Strengthening Network Intrusion Detection in IoT Environments with Self-Supervised Learning and Few Shot Learning

Safa Ben Atitallah, Maha Driss, Wadii Boulila et al.

The Internet of Things (IoT) has been introduced as a breakthrough technology that integrates intelligence into everyday objects, enabling high levels of connectivity between them. As the IoT networks grow and expand, they become more susceptible to cybersecurity attacks. A significant challenge in current intrusion detection systems for IoT includes handling imbalanced datasets where labeled data are scarce, particularly for new and rare types of cyber attacks. Existing literature often fails to detect such underrepresented attack classes. This paper introduces a novel intrusion detection approach designed to address these challenges. By integrating Self Supervised Learning (SSL), Few Shot Learning (FSL), and Random Forest (RF), our approach excels in learning from limited and imbalanced data and enhancing detection capabilities. The approach starts with a Deep Infomax model trained to extract key features from the dataset. These features are then fed into a prototypical network to generate discriminate embedding. Subsequently, an RF classifier is employed to detect and classify potential malware, including a range of attacks that are frequently observed in IoT networks. The proposed approach was evaluated through two different datasets, MaleVis and WSN-DS, which demonstrate its superior performance with accuracies of 98.60% and 99.56%, precisions of 98.79% and 99.56%, recalls of 98.60% and 99.56%, and F1-scores of 98.63% and 99.56%, respectively.

4.7SEMar 24, 2024

LLMs as Compiler for Arabic Programming Language

Serry Sibaee, Omar Najar, Lahouri Ghouti et al.

In this paper we introduce APL (Arabic Programming Language) that uses Large language models (LLM) as semi-compiler to covert Arabic text code to python code then run the code. Designing a full pipeline from the structure of the APL text then a prompt (using prompt engineering) then running the prodcued python code using PyRunner. This project has a three parts first python library, a playground with simple interface and this research paper.

5.3IVMay 12, 2023

Unlocking the Potential of Medical Imaging with ChatGPT's Intelligent Diagnostics

Ayyub Alzahem, Shahid Latif, Wadii Boulila et al.

Medical imaging is an essential tool for diagnosing various healthcare diseases and conditions. However, analyzing medical images is a complex and time-consuming task that requires expertise and experience. This article aims to design a decision support system to assist healthcare providers and patients in making decisions about diagnosing, treating, and managing health conditions. The proposed architecture contains three stages: 1) data collection and labeling, 2) model training, and 3) diagnosis report generation. The key idea is to train a deep learning model on a medical image dataset to extract four types of information: the type of image scan, the body part, the test image, and the results. This information is then fed into ChatGPT to generate automatic diagnostics. The proposed system has the potential to enhance decision-making, reduce costs, and improve the capabilities of healthcare providers. The efficacy of the proposed system is analyzed by conducting extensive experiments on a large medical image dataset. The experimental outcomes exhibited promising performance for automatic diagnosis through medical images.

6.6SEMay 10, 2023

Humans are Still Better than ChatGPT: Case of the IEEEXtreme Competition

Anis Koubaa, Basit Qureshi, Adel Ammar et al.

Since the release of ChatGPT, numerous studies have highlighted the remarkable performance of ChatGPT, which often rivals or even surpasses human capabilities in various tasks and domains. However, this paper presents a contrasting perspective by demonstrating an instance where human performance excels in typical tasks suited for ChatGPT, specifically in the domain of computer programming. We utilize the IEEExtreme Challenge competition as a benchmark, a prestigious, annual international programming contest encompassing a wide range of problems with different complexities. To conduct a thorough evaluation, we selected and executed a diverse set of 102 challenges, drawn from five distinct IEEExtreme editions, using three major programming languages: Python, Java, and C++. Our empirical analysis provides evidence that contrary to popular belief, human programmers maintain a competitive edge over ChatGPT in certain aspects of problem-solving within the programming context. In fact, we found that the average score obtained by ChatGPT on the set of IEEExtreme programming problems is 3.9 to 5.8 times lower than the average human score, depending on the programming language. This paper elaborates on these findings, offering critical insights into the limitations and potential areas of improvement for AI-based language models like ChatGPT.

1.4CVMay 10, 2021

An Enhanced Randomly Initialized Convolutional Neural Network for Columnar Cactus Recognition in Unmanned Aerial Vehicle Imagery

Safa Ben Atitallah, Maha Driss, Wadii Boulila et al.

Recently, Convolutional Neural Networks (CNNs) have made a great performance for remote sensing image classification. Plant recognition using CNNs is one of the active deep learning research topics due to its added-value in different related fields, especially environmental conservation and natural areas preservation. Automatic recognition of plants in protected areas helps in the surveillance process of these zones and ensures the sustainability of their ecosystems. In this work, we propose an Enhanced Randomly Initialized Convolutional Neural Network (ERI-CNN) for the recognition of columnar cactus, which is an endemic plant that exists in the Tehuacán-Cuicatlán Valley in southeastern Mexico. We used a public dataset created by a group of researchers that consists of more than 20000 remote sensing images. The experimental results confirm the effectiveness of the proposed model compared to other models reported in the literature like InceptionV3 and the modified LeNet-5 CNN. Our ERI-CNN provides 98% of accuracy, 97% of precision, 97% of recall, 97.5% as f1-score, and 0.056 loss.

1.6IRJul 21, 2020

Deep Learning Techniques for Future Intelligent Cross-Media Retrieval

Sadaqat ur Rehman, Muhammad Waqas, Shanshan Tu et al.

With the advancement in technology and the expansion of broadcasting, cross-media retrieval has gained much attention. It plays a significant role in big data applications and consists in searching and finding data from different types of media. In this paper, we provide a novel taxonomy according to the challenges faced by multi-modal deep learning approaches in solving cross-media retrieval, namely: representation, alignment, and translation. These challenges are evaluated on deep learning (DL) based methods, which are categorized into four main groups: 1) unsupervised methods, 2) supervised methods, 3) pairwise based methods, and 4) rank based methods. Then, we present some well-known cross-media datasets used for retrieval, considering the importance of these datasets in the context in of deep learning based cross-media retrieval approaches. Moreover, we also present an extensive review of the state-of-the-art problems and its corresponding solutions for encouraging deep learning in cross-media retrieval. The fundamental objective of this work is to exploit Deep Neural Networks (DNNs) for bridging the "media gap", and provide researchers and developers with a better understanding of the underlying problems and the potential solutions of deep learning assisted cross-media retrieval. To the best of our knowledge, this is the first comprehensive survey to address cross-media retrieval under deep learning methods.

3.3CVMay 11, 2020

Deep-Learning-based Automated Palm Tree Counting and Geolocation in Large Farms from Aerial Geotagged Images

Adel Ammar, Anis Koubaa

In this paper, we propose a deep learning framework for the automated counting and geolocation of palm trees from aerial images using convolutional neural networks. For this purpose, we collected aerial images in a palm tree Farm in the Kharj region, in Riyadh Saudi Arabia, using DJI drones, and we built a dataset of around 10,000 instances of palms trees. Then, we developed a convolutional neural network model using the state-of-the-art, Faster R-CNN algorithm. Furthermore, using the geotagged metadata of aerial images, we used photogrammetry concepts and distance corrections to detect the geographical location of detected palms trees automatically. This geolocation technique was tested on two different types of drones (DJI Mavic Pro, and Phantom 4 Pro), and was assessed to provide an average geolocation accuracy of 2.8m. This GPS tagging allows us to uniquely identify palm trees and count their number from a series of drone images, while correctly dealing with the issue of image overlapping. Moreover, it can be generalized to the geolocation of any other objects in UAV images.

2.3CVApr 18, 2020Code

DriftNet: Aggressive Driving Behavior Classification using 3D EfficientNet Architecture

Alam Noor, Bilel Benjdira, Adel Ammar et al.

Aggressive driving (i.e., car drifting) is a dangerous behavior that puts human safety and life into a significant risk. This behavior is considered as an anomaly concerning the regular traffic in public transportation roads. Recent techniques in deep learning proposed new approaches for anomaly detection in different contexts such as pedestrian monitoring, street fighting, and threat detection. In this paper, we propose a new anomaly detection framework applied to the detection of aggressive driving behavior. Our contribution consists in the development of a 3D neural network architecture, based on the state-of-the-art EfficientNet 2D image classifier, for the aggressive driving detection in videos. We propose an EfficientNet3D CNN feature extractor for video analysis, and we compare it with existing feature extractors. We also created a dataset of car drifting in Saudi Arabian context https://www.youtube.com/watch?v=vLzgye1-d1k . To the best of our knowledge, this is the first work that addresses the problem of aggressive driving behavior using deep learning.

0.9CVNov 18, 2019

AI-based Pilgrim Detection using Convolutional Neural Networks

Marwa Ben Jabra, Adel Ammar, Anis Koubaa et al.

Pilgrimage represents the most important Islamic religious gathering in the world where millions of pilgrims visit the holy places of Makkah and Madinah to perform their rituals. The safety and security of pilgrims is the highest priority for the authorities. In Makkah, 5000 cameras are spread around the holy for monitoring pilgrims, but it is almost impossible to track all events by humans considering the huge number of images collected every second. To address this issue, we propose to use artificial intelligence technique based on deep learning and convolution neural networks to detect and identify Pilgrims and their features. For this purpose, we built a comprehensive dataset for the detection of pilgrims and their genders. Then, we develop two convolutional neural networks based on YOLOv3 and Faster-RCNN for the detection of Pilgrims. Experiments results show that Faster RCNN with Inception v2 feature extractor provides the best mean average precision over all classes of 51%.

2.6CVNov 11, 2019

Activity Monitoring of Islamic Prayer (Salat) Postures using Deep Learning

Anis Koubaa, Adel Ammar, Bilel Benjdira et al.

In the Muslim community, the prayer (i.e. Salat) is the second pillar of Islam, and it is the most essential and fundamental worshiping activity that believers have to perform five times a day. From a gestures' perspective, there are predefined human postures that must be performed in a precise manner. However, for several people, these postures are not correctly performed, due to being new to Salat or even having learned prayers in an incorrect manner. Furthermore, the time spent in each posture has to be balanced. To address these issues, we propose to develop an artificial intelligence assistive framework that guides worshippers to evaluate the correctness of the postures of their prayers. This paper represents the first step to achieve this objective and addresses the problem of the recognition of the basic gestures of Islamic prayer using Convolutional Neural Networks (CNN). The contribution of this paper lies in building a dataset for the basic Salat positions, and train a YOLOv3 neural network for the recognition of the gestures. Experimental results demonstrate that the mean average precision attains 85% for a training dataset of 764 images of the different postures. To the best of our knowledge, this is the first work that addresses human activity recognition of Salat using deep learning.

6.0CVOct 16, 2019Code

Aerial Images Processing for Car Detection using Convolutional Neural Networks: Comparison between Faster R-CNN and YoloV3

Adel Ammar, Anis Koubaa, Mohanned Ahmed et al.

In this paper, we address the problem of car detection from aerial images using Convolutional Neural Networks (CNN). This problem presents additional challenges as compared to car (or any object) detection from ground images because features of vehicles from aerial images are more difficult to discern. To investigate this issue, we assess the performance of two state-of-the-art CNN algorithms, namely Faster R-CNN, which is the most popular region-based algorithm, and YOLOv3, which is known to be the fastest detection algorithm. We analyze two datasets with different characteristics to check the impact of various factors, such as UAV's altitude, camera resolution, and object size. A total of 39 training experiments were conducted to account for the effect of different hyperparameter values. The objective of this work is to conduct the most robust and exhaustive comparison between these two cutting-edge algorithms on the specific domain of aerial images. By using a variety of metrics, we show that YOLOv3 yields better performance in most configurations, except that it exhibits a lower recall and less confident detections when object sizes and scales in the testing dataset differ largely from those in the training dataset.

12.3ROJun 22, 2019

Micro Air Vehicle Link (MAVLink) in a Nutshell: A Survey

Anis Koubaa, Azza Allouch, Maram Alajlan et al.

The Micro Air Vehicle Link (MAVLink in short) is a communication protocol for unmanned systems (e.g., drones, robots). It specifies a comprehensive set of messages exchanged between unmanned systems and ground stations. This protocol is used in major autopilot systems, mainly ArduPilot and PX4, and provides powerful features not only for monitoring and controlling unmanned systems missions but also for their integration into the Internet. However, there is no technical survey and/or tutorial in the literature that presents these features or explains how to make use of them. Most of the references are online tutorials and basic technical reports, and none of them presents comprehensive and systematic coverage of the protocol. In this paper, we address this gap, and we propose an overview of the MAVLink protocol, the difference between its versions, and its potential in enabling Internet connectivity to unmanned systems. We also discuss the security aspects of MAVLink. To the best of our knowledge, this is the first technical survey and tutorial on the MAVLink protocol, which represents an important reference for unmanned systems users and developers.

2.7CRApr 30, 2019

BlockLoc: Secure Localization in the Internet-of-Things using Blockchain

Omar Cheikhrouhou, Anis Koubaa

Several IoT applications are tightly dependent on the locations of the devices. However, localization algorithms can be easily compromised by injecting false locations. In this paper, we propose a Blockchain-based secure localization algorithm for the Internet of Things (IoT). The algorithm uses a public ledger (Blockchain) that contains nodes position and the list of their neighbor nodes. This ledger is shared among the IoT devices. Once an IoT device is localized its new position and the list of neighbor nodes are added to the Blockchain. This shared localization data will be used later by other IoT devices for their localization process. To avoid the attack where a malicious node sends a fake position, the correctness of the claimed position are verified before adding it to the Blockchain. Moreover, data exchanged between nodes (IoT devices) are signed to guarantee their authenticity and integrity. The integration of these security mechanisms into the localization process permits to exclude false data and therefore reduces the localization error. The simulation results show that adding the proposed security mechanism improves the localization accuracy of the algorithm when running in the presence of malicious nodes.

6.2ROApr 20, 2019

Qualitative and Quantitative Risk Analysis and Safety Assessment of Unmanned Aerial Vehicles Missions over the Internet

Azza Allouch, Anis Koubaa, Mohamed Khalgui et al.

In the last few years, Unmanned Aerial Vehicles (UAVs) are making a revolution as an emerging technology with many different applications in the military, civilian, and commercial fields. The advent of autonomous drones has initiated serious challenges, including how to maintain their safe operation during their missions. The safe operation of UAVs remains an open and sensitive issue since any unexpected behavior of the drone or any hazard would lead to potential risks that might be very severe. The motivation behind this work is to propose a methodology for the safety assurance of drones over the Internet {(Internet of drones (IoD))}. Two approaches will be used in performing the safety analysis: (1) a qualitative safety analysis approach, and (2) a quantitative safety analysis approach. The first approach uses the international safety standards, namely ISO 12100 and ISO 13849 to assess the safety of drone's missions by focusing on qualitative assessment techniques. The methodology starts with hazard identification, risk assessment, risk mitigation, and finally, draws the safety recommendations associated with a drone delivery use case. The second approach presents a method for the quantitative safety assessment using Bayesian Networks (BN) for probabilistic modeling. BN utilizes the information provided by the first approach to model the safety risks related to UAVs' flights. An illustrative UAV crash scenario is presented as a case study, followed by scenario analysis, to demonstrate the applicability of the proposed approach. These two analyses, qualitative and quantitative, enable { all involved stakeholders} to detect, explore and address the risks of UAV flights, which will help the industry to better manage the safety concerns of UAVs.

3.5ROApr 5, 2019

Towards a Realistic Simulation Framework for Vehicular Platooning Applications

Bruno Vieira, Ricardo Severino, Anis Koubaa et al.

Cooperative vehicle platooning applications increasingly demand realistic simulation tools to ease their validation and to bridge the gap between development and real-world deployment. However, their complexity and cost often hinder its validation in the real world. In this paper, we propose a realistic simulation framework for vehicular platoons that integrates Gazebo with OMNeT++ over Robot Operating System (ROS) to support the simulation of realistic scenarios of autonomous vehicular platoons and their cooperative control.

3.5ROJan 19, 2019

Service-Oriented Software Architecture for Cloud Robotics

Anis Koubaa

In this article, we present an overview of the use of service-oriented architecture and Web services in developing robotics applications and software integrated with the Internet and the Cloud. This is a recent trend that emerged since 2010 from the concept of cloud robotics, which leverages the use of cloud infrastructures for robotics applications following a service-oriented architecture approach. In particular, we distinguish two main categories: (\textit{i.}) virtualization of robotics systems and (\textit{ii.}) computation offloading from robots to cloud-based services. We discuss the main approaches proposed in the literature to design robotics systems through the Web and their integration to the cloud through a service-oriented computing framework.

12.1RODec 28, 2018

Car Detection using Unmanned Aerial Vehicles: Comparison between Faster R-CNN and YOLOv3

Bilel Benjdira, Taha Khursheed, Anis Koubaa et al.

Unmanned Aerial Vehicles are increasingly being used in surveillance and traffic monitoring thanks to their high mobility and ability to cover areas at different altitudes and locations. One of the major challenges is to use aerial images to accurately detect cars and count them in real-time for traffic monitoring purposes. Several deep learning techniques were recently proposed based on convolution neural network (CNN) for real-time classification and recognition in computer vision. However, their performance depends on the scenarios where they are used. In this paper, we investigate the performance of two state-of-the-art CNN algorithms, namely Faster R-CNN and YOLOv3, in the context of car detection from aerial images. We trained and tested these two models on a large car dataset taken from UAVs. We demonstrated in this paper that YOLOv3 outperforms Faster R-CNN in sensitivity and processing time, although they are comparable in the precision metric.