Bilel Benjdira

CV
h-index190
26papers
960citations
Novelty24%
AI Score50

26 Papers

CVApr 18Code
NTIRE 2026 Challenge on Single Image Reflection Removal in the Wild: Datasets, Results, and Methods

Jie Cai, Kangning Yang, Zhiyuan Li et al.

In this paper, we review the NTIRE 2026 challenge on single-image reflection removal (SIRR) in the wild. SIRR is a fundamental task in image restoration. Despite progress in academic research, most methods are tested on synthetic images or limited real-world images, creating a gap in real-world applications. In this challenge, we provide participants with the OpenRR-5k dataset. This dataset requires participants to process real-world images covering a range of reflection scenarios and intensities, aiming to generate clean images without reflections. The challenge attracted more than 100 registrations, with eleven of them participating in the final testing phase. The top-ranked methods advanced the state-of-the-art reflection removal performance and earned unanimous recognition from five experts in the field. The proposed OpenRR-5k dataset is available at https://huggingface.co/datasets/qiuzhangTiTi/OpenRR-5k, and the homepage of this challenge is at https://github.com/caijie0620/OpenRR-5k.

CVApr 13Code
NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild

Aleksandr Gushchin, Khaled Abud, Ekaterina Shumitskaya et al.

This paper presents an overview of the NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild, held in conjunction with the NTIRE workshop at CVPR 2026. The goal of this challenge was to develop detection models capable of distinguishing real images from generated ones in realistic scenarios: the images are often transformed (cropped, resized, compressed, blurred) for practical usage, and therefore, the detection models should be robust to such transformations. The challenge is based on a novel dataset consisting of 108,750 real and 185,750 AI-generated images from 42 generators comprising a large variety of open-source and closed-source models of various architectures, augmented with 36 image transformations. Methods were evaluated using ROC AUC on the full test set, including both transformed and untransformed images. A total of 511 participants registered, with 20 teams submitting valid final solutions. This report provides a comprehensive overview of the challenge, describes the proposed solutions, and can be used as a valuable reference for researchers and practitioners in increasing the robustness of the detection models to real-world transformations.

CVMar 1, 2023Code
TAU: A Framework for Video-Based Traffic Analytics Leveraging Artificial Intelligence and Unmanned Aerial Systems

Bilel Benjdira, Anis Koubaa, Ahmad Taher Azar et al.

Smart traffic engineering and intelligent transportation services are in increasing demand from governmental authorities to optimize traffic performance and thus reduce energy costs, increase the drivers' safety and comfort, ensure traffic laws enforcement, and detect traffic violations. In this paper, we address this challenge, and we leverage the use of Artificial Intelligence (AI) and Unmanned Aerial Vehicles (UAVs) to develop an AI-integrated video analytics framework, called TAU (Traffic Analysis from UAVs), for automated traffic analytics and understanding. Unlike previous works on traffic video analytics, we propose an automated object detection and tracking pipeline from video processing to advanced traffic understanding using high-resolution UAV images. TAU combines six main contributions. First, it proposes a pre-processing algorithm to adapt the high-resolution UAV image as input to the object detector without lowering the resolution. This ensures an excellent detection accuracy from high-quality features, particularly the small size of detected objects from UAV images. Second, it introduces an algorithm for recalibrating the vehicle coordinates to ensure that vehicles are uniquely identified and tracked across the multiple crops of the same frame. Third, it presents a speed calculation algorithm based on accumulating information from successive frames. Fourth, TAU counts the number of vehicles per traffic zone based on the Ray Tracing algorithm. Fifth, TAU has a fully independent algorithm for crossroad arbitration based on the data gathered from the different zones surrounding it. Sixth, TAU introduces a set of algorithms for extracting twenty-four types of insights from the raw data collected. The code is shared here: https://github.com/bilel-bj/TAU. Video demonstrations are provided here: https://youtu.be/wXJV0H7LviU and here: https://youtu.be/kGv0gmtVEbI.

ROAug 22, 2023Code
ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts

Bilel Benjdira, Anis Koubaa, Anas M. Ali

In this paper, we argue that the next generation of robots can be commanded using only Language Models' prompts. Every prompt interrogates separately a specific Robotic Modality via its Modality Language Model (MLM). A central Task Modality mediates the whole communication to execute the robotic mission via a Large Language Model (LLM). This paper gives this new robotic design pattern the name of: Prompting Robotic Modalities (PRM). Moreover, this paper applies this PRM design pattern in building a new robotic framework named ROSGPT_Vision. ROSGPT_Vision allows the execution of a robotic task using only two prompts: a Visual and an LLM prompt. The Visual Prompt extracts, in natural language, the visual semantic features related to the task under consideration (Visual Robotic Modality). Meanwhile, the LLM Prompt regulates the robotic reaction to the visual description (Task Modality). The framework automates all the mechanisms behind these two prompts. The framework enables the robot to address complex real-world scenarios by processing visual data, making informed decisions, and carrying out actions automatically. The framework comprises one generic vision module and two independent ROS nodes. As a test application, we used ROSGPT_Vision to develop CarMate, which monitors the driver's distraction on the roads and makes real-time vocal notifications to the driver. We showed how ROSGPT_Vision significantly reduced the development cost compared to traditional methods. We demonstrated how to improve the quality of the application by optimizing the prompting strategies, without delving into technical details. ROSGPT_Vision is shared with the community (link: https://github.com/bilel-bj/ROSGPT_Vision) to advance robotic research in this direction and to build more robotic frameworks that implement the PRM design pattern and enables controlling robots using only prompts.

CVApr 23Code
Trust-SSL: Additive-Residual Selective Invariance for Robust Aerial Self-Supervised Learning

Wadii Boulila, Adel Ammar, Bilel Benjdira et al.

Self-supervised learning (SSL) is a standard approach for representation learning in aerial imagery. Existing methods enforce invariance between augmented views, which works well when augmentations preserve semantic content. However, aerial images are frequently degraded by haze, motion blur, rain, and occlusion that remove critical evidence. Enforcing alignment between a clean and a severely degraded view can introduce spurious structure into the latent space. This study proposes a training strategy and architectural modification to enhance SSL robustness to such corruptions. It introduces a per-sample, per-factor trust weight into the alignment objective, combined with the base contrastive loss as an additive residual. A stop-gradient is applied to the trust weight instead of a multiplicative gate. While a multiplicative gate is a natural choice, experiments show it impairs the backbone, whereas our additive-residual approach improves it. Using a 200-epoch protocol on a 210,000-image corpus, the method achieves the highest mean linear-probe accuracy among six backbones on EuroSAT, AID, and NWPU-RESISC45 (90.20% compared to 88.46% for SimCLR and 89.82% for VICReg). It yields the largest improvements under severe information-erasing corruptions on EuroSAT (+19.9 points on haze at s=5 over SimCLR). The method also demonstrates consistent gains of +1 to +3 points in Mahalanobis AUROC on a zero-shot cross-domain stress test using BDD100K weather splits. Two ablations (scalar uncertainty and cosine gate) indicate the additive-residual formulation is the primary source of these improvements. An evidential variant using Dempster-Shafer fusion introduces interpretable signals of conflict and ignorance. These findings offer a concrete design principle for uncertainty-aware SSL. Code is publicly available at https://github.com/WadiiBoulila/trust-ssl.

CVApr 19
Low Light Image Enhancement Challenge at NTIRE 2026

George Ciubotariu, Sharif S M A, Abdur Rehman et al.

This paper presents a comprehensive review of the NTIRE 2026 Low Light Image Enhancement Challenge, highlighting the proposed solutions and final results. The objective of this challenge is to identify effective networks capable of producing clearer and visually compelling images in diverse and challenging conditions by learning representative visual cues with the purpose of restoring information loss due to low-contrast and noisy images. A total of 195 participants registered for the first track and 153 for the second track of the competition, and 22 teams ultimately submitted valid entries. This paper thoroughly evaluates the state-of-the-art advances in (joint denoising and) low-light image enhancement, showcasing the significant progress in the field, while leveraging samples of our novel dataset.

CVApr 12
NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

Xin Li, Yeying Jin, Suhang Yao et al.

This paper presents an overview of the NTIRE 2026 Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images. Building upon the success of the first edition, this challenge attracted a wide range of impressive solutions, all developed and evaluated on our real-world Raindrop Clarity dataset~\cite{jin2024raindrop}. For this edition, we adjust the dataset with 14,139 images for training, 407 images for validation, and 593 images for testing. The primary goal of this challenge is to establish a strong and practical benchmark for the removal of raindrops under various illumination and focus conditions. In total, 168 teams have registered for the competition, and 17 teams submitted valid final solutions and fact sheets for the testing phase. The submitted methods achieved strong performance on the Raindrop Clarity dataset, demonstrating the growing progress in this challenging task.

CVApr 16
The Fourth Challenge on Image Super-Resolution ($\times$4) at NTIRE 2026: Benchmark Results and Method Overview

Zheng Chen, Kai Liu, Jingkai Wang et al.

This paper presents the NTIRE 2026 image super-resolution ($\times$4) challenge, one of the associated competitions of the NTIRE 2026 Workshop at CVPR 2026. The challenge aims to reconstruct high-resolution (HR) images from low-resolution (LR) inputs generated through bicubic downsampling with a $\times$4 scaling factor. The objective is to develop effective super-resolution solutions and analyze recent advances in the field. To reflect the evolving objectives of image super-resolution, the challenge includes two tracks: (1) a restoration track, which emphasizes pixel-wise fidelity and ranks submissions based on PSNR; and (2) a perceptual track, which focuses on visual realism and evaluates results using a perceptual score. A total of 194 participants registered for the challenge, with 31 teams submitting valid entries. This report summarizes the challenge design, datasets, evaluation protocol, main results, and methods of participating teams. The challenge provides a unified benchmark and offers insights into current progress and future directions in image super-resolution.

CVAug 30, 2023
Early Detection of Red Palm Weevil Infestations using Deep Learning Classification of Acoustic Signals

Wadii Boulila, Ayyub Alzahem, Anis Koubaa et al.

The Red Palm Weevil (RPW), also known as the palm weevil, is considered among the world's most damaging insect pests of palms. Current detection techniques include the detection of symptoms of RPW using visual or sound inspection and chemical detection of volatile signatures generated by infested palm trees. However, efficient detection of RPW diseases at an early stage is considered one of the most challenging issues for cultivating date palms. In this paper, an efficient approach to the early detection of RPW is proposed. The proposed approach is based on RPW sound activities being recorded and analyzed. The first step involves the conversion of sound data into images based on a selected set of features. The second step involves the combination of images from the same sound file but computed by different features into a single image. The third step involves the application of different Deep Learning (DL) techniques to classify resulting images into two classes: infested and not infested. Experimental results show good performances of the proposed approach for RPW detection using different DL techniques, namely MobileNetV2, ResNet50V2, ResNet152V2, VGG16, VGG19, DenseNet121, DenseNet201, Xception, and InceptionV3. The proposed approach outperformed existing techniques for public datasets.

CLOct 16, 2023
Prediction of Arabic Legal Rulings using Large Language Models

Adel Ammar, Anis Koubaa, Bilel Benjdira et al.

In the intricate field of legal studies, the analysis of court decisions is a cornerstone for the effective functioning of the judicial system. The ability to predict court outcomes helps judges during the decision-making process and equips lawyers with invaluable insights, enhancing their strategic approaches to cases. Despite its significance, the domain of Arabic court analysis remains under-explored. This paper pioneers a comprehensive predictive analysis of Arabic court decisions on a dataset of 10,813 commercial court real cases, leveraging the advanced capabilities of the current state-of-the-art large language models. Through a systematic exploration, we evaluate three prevalent foundational models (LLaMA-7b, JAIS-13b, and GPT3.5-turbo) and three training paradigms: zero-shot, one-shot, and tailored fine-tuning. Besides, we assess the benefit of summarizing and/or translating the original Arabic input texts. This leads to a spectrum of 14 model variants, for which we offer a granular performance assessment with a series of different metrics (human assessment, GPT evaluation, ROUGE, and BLEU scores). We show that all variants of LLaMA models yield limited performance, whereas GPT-3.5-based models outperform all other models by a wide margin, surpassing the average score of the dedicated Arabic-centric JAIS model by 50%. Furthermore, we show that all scores except human evaluation are inconsistent and unreliable for assessing the performance of large language models on court decision predictions. This study paves the way for future research, bridging the gap between computational linguistics and Arabic legal analytics.

IVMar 15, 2022
Securing the Classification of COVID-19 in Chest X-ray Images: A Privacy-Preserving Deep Learning Approach

Wadii Boulila, Adel Ammar, Bilel Benjdira et al.

Deep learning (DL) is being increasingly utilized in healthcare-related fields due to its outstanding efficiency. However, we have to keep the individual health data used by DL models private and secure. Protecting data and preserving the privacy of individuals has become an increasingly prevalent issue. The gap between the DL and privacy communities must be bridged. In this paper, we propose privacy-preserving deep learning (PPDL)-based approach to secure the classification of Chest X-ray images. This study aims to use Chest X-ray images to their fullest potential without compromising the privacy of the data that it contains. The proposed approach is based on two steps: encrypting the dataset using partially homomorphic encryption and training/testing the DL algorithm over the encrypted images. Experimental results on the COVID-19 Radiography database show that the MobileNetV2 model achieves an accuracy of 94.2% over the plain data and 93.3% over the encrypted data.

CVSep 21, 2023
License Plate Super-Resolution Using Diffusion Models

Sawsan AlHalawani, Bilel Benjdira, Adel Ammar et al.

In surveillance, accurately recognizing license plates is hindered by their often low quality and small dimensions, compromising recognition precision. Despite advancements in AI-based image super-resolution, methods like Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs) still fall short in enhancing license plate images. This study leverages the cutting-edge diffusion model, which has consistently outperformed other deep learning techniques in image restoration. By training this model using a curated dataset of Saudi license plates, both in low and high resolutions, we discovered the diffusion model's superior efficacy. The method achieves a 12.55\% and 37.32% improvement in Peak Signal-to-Noise Ratio (PSNR) over SwinIR and ESRGAN, respectively. Moreover, our method surpasses these techniques in terms of Structural Similarity Index (SSIM), registering a 4.89% and 17.66% improvement over SwinIR and ESRGAN, respectively. Furthermore, 92% of human evaluators preferred our images over those from other algorithms. In essence, this research presents a pioneering solution for license plate super-resolution, with tangible potential for surveillance systems.

CVApr 26, 2023
Streamlined Global and Local Features Combinator (SGLC) for High Resolution Image Dehazing

Bilel Benjdira, Anas M. Ali, Anis Koubaa

Image Dehazing aims to remove atmospheric fog or haze from an image. Although the Dehazing models have evolved a lot in recent years, few have precisely tackled the problem of High-Resolution hazy images. For this kind of image, the model needs to work on a downscaled version of the image or on cropped patches from it. In both cases, the accuracy will drop. This is primarily due to the inherent failure to combine global and local features when the image size increases. The Dehazing model requires global features to understand the general scene peculiarities and the local features to work better with fine and pixel details. In this study, we propose the Streamlined Global and Local Features Combinator (SGLC) to solve these issues and to optimize the application of any Dehazing model to High-Resolution images. The SGLC contains two successive blocks. The first is the Global Features Generator (GFG) which generates the first version of the Dehazed image containing strong global features. The second block is the Local Features Enhancer (LFE) which improves the local feature details inside the previously generated image. When tested on the Uformer architecture for Dehazing, SGLC increased the PSNR metric by a significant margin. Any other model can be incorporated inside the SGLC process to improve its efficiency on High-Resolution input data.

CVMar 15, 2022
Parking Analytics Framework using Deep Learning

Bilel Benjdira, Anis Koubaa, Wadii Boulila et al.

With the number of vehicles continuously increasing, parking monitoring and analysis are becoming a substantial feature of modern cities. In this study, we present a methodology to monitor car parking areas and to analyze their occupancy in real-time. The solution is based on a combination between image analysis and deep learning techniques. It incorporates four building blocks put inside a pipeline: vehicle detection, vehicle tracking, manual annotation of parking slots, and occupancy estimation using the Ray Tracing algorithm. The aim of this methodology is to optimize the use of parking areas and to reduce the time wasted by daily drivers to find the right parking slot for their cars. Also, it helps to better manage the space of the parking areas and to discover misuse cases. A demonstration of the provided solution is shown in the following video link: https://www.youtube.com/watch?v=KbAt8zT14Tc.

CVSep 27, 2023
Guided Frequency Loss for Image Restoration

Bilel Benjdira, Anas M. Ali, Anis Koubaa

Image Restoration has seen remarkable progress in recent years. Many generative models have been adapted to tackle the known restoration cases of images. However, the interest in benefiting from the frequency domain is not well explored despite its major factor in these particular cases of image synthesis. In this study, we propose the Guided Frequency Loss (GFL), which helps the model to learn in a balanced way the image's frequency content alongside the spatial content. It aggregates three major components that work in parallel to enhance learning efficiency; a Charbonnier component, a Laplacian Pyramid component, and a Gradual Frequency component. We tested GFL on the Super Resolution and the Denoising tasks. We used three different datasets and three different architectures for each of them. We found that the GFL loss improved the PSNR metric in most implemented experiments. Also, it improved the training of the Super Resolution models in both SwinIR and SRGAN. In addition, the utility of the GFL loss increased better on constrained data due to the less stochasticity in the high frequencies' components among samples.

CVApr 27
Robust Deepfake Detection, NTIRE 2026 Challenge: Report

Benedikt Hopf, Radu Timofte, Chenfan Qu et al.

Robustness is a long-overlooked problem in deepfake detection. However, detection performance is nearly worthless in the real world if it suffers under exposure to even slight image degradation. In addition to weaker degradations that can accidentally occur in the image processing pipeline, there is another risk of malicious deepfakes that specifically introduce degradations, purposefully exploiting the detector's weaknesses in that regard. Here, we present an overview of the NTIRE 2026 Robust Deepfake Detection Challenge, which specifically addresses that problem. Participants were tasked with building a detector that would later be tested on an unknown test-set, which included both common and uncommon degradations of various strengths. With a total number of 337 participants and 57 submissions to the final leaderboard, the first edition of the challenge was well received. To ensure the reliability of the results, participants were given only 24h to complete the test run with no labels provided, limiting the possibility of training on the test data. Furthermore, the top solutions were scored on a private test-set to detect any such overfitting. This report presents the competition setting, dataset preparation, as well as details and performance of methods. Top methods rely on large foundation models, ensembles, and degradation training to combine generality and robustness.

CVOct 15, 2025
NTIRE 2025 Challenge on Low Light Image Enhancement: Methods and Results

Xiaoning Liu, Zongwei Wu, Florin-Alexandru Vasluianu et al.

This paper presents a comprehensive review of the NTIRE 2025 Low-Light Image Enhancement (LLIE) Challenge, highlighting the proposed solutions and final outcomes. The objective of the challenge is to identify effective networks capable of producing brighter, clearer, and visually compelling images under diverse and challenging conditions. A remarkable total of 762 participants registered for the competition, with 28 teams ultimately submitting valid entries. This paper thoroughly evaluates the state-of-the-art advancements in LLIE, showcasing the significant progress.

CVJun 18, 2025
NTIRE 2025 Image Shadow Removal Challenge Report

Florin-Alexandru Vasluianu, Tim Seizinger, Zhuyun Zhou et al.

This work examines the findings of the NTIRE 2025 Shadow Removal Challenge. A total of 306 participants have registered, with 17 teams successfully submitting their solutions during the final evaluation phase. Following the last two editions, this challenge had two evaluation tracks: one focusing on reconstruction fidelity and the other on visual perception through a user study. Both tracks were evaluated with images from the WSRD+ dataset, simulating interactions between self- and cast-shadows with a large number of diverse objects, textures, and materials.

CVApr 16, 2025
The Tenth NTIRE 2025 Image Denoising Challenge Report

Lei Sun, Hang Guo, Bin Ren et al.

This paper presents an overview of the NTIRE 2025 Image Denoising Challenge (σ = 50), highlighting the proposed methodologies and corresponding results. The primary objective is to develop a network architecture capable of achieving high-quality denoising performance, quantitatively evaluated using PSNR, without constraints on computational complexity or model size. The task assumes independent additive white Gaussian noise (AWGN) with a fixed noise level of 50. A total of 290 participants registered for the challenge, with 20 teams successfully submitting valid results, providing insights into the current state-of-the-art in image denoising.

CVApr 20, 2025
NTIRE 2025 Challenge on Image Super-Resolution ($\times$4): Methods and Results

Zheng Chen, Kai Liu, Jue Gong et al.

This paper presents the NTIRE 2025 image super-resolution ($\times$4) challenge, one of the associated competitions of the 10th NTIRE Workshop at CVPR 2025. The challenge aims to recover high-resolution (HR) images from low-resolution (LR) counterparts generated through bicubic downsampling with a $\times$4 scaling factor. The objective is to develop effective network designs or solutions that achieve state-of-the-art SR performance. To reflect the dual objectives of image SR research, the challenge includes two sub-tracks: (1) a restoration track, emphasizes pixel-wise accuracy and ranks submissions based on PSNR; (2) a perceptual track, focuses on visual realism and ranks results by a perceptual score. A total of 286 participants registered for the competition, with 25 teams submitting valid entries. This report summarizes the challenge design, datasets, evaluation protocol, the main results, and methods of each team. The challenge serves as a benchmark to advance the state of the art and foster progress in image SR.

CVAug 28, 2025
Dual-Stage Global and Local Feature Framework for Image Dehazing

Anas M. Ali, Anis Koubaa, Bilel Benjdira

Addressing the challenge of removing atmospheric fog or haze from digital images, known as image dehazing, has recently gained significant traction in the computer vision community. Although contemporary dehazing models have demonstrated promising performance, few have thoroughly investigated high-resolution imagery. In such scenarios, practitioners often resort to downsampling the input image or processing it in smaller patches, which leads to a notable performance degradation. This drop is primarily linked to the difficulty of effectively combining global contextual information with localized, fine-grained details as the spatial resolution grows. In this chapter, we propose a novel framework, termed the Streamlined Global and Local Features Combinator (SGLC), to bridge this gap and enable robust dehazing for high-resolution inputs. Our approach is composed of two principal components: the Global Features Generator (GFG) and the Local Features Enhancer (LFE). The GFG produces an initial dehazed output by focusing on broad contextual understanding of the scene. Subsequently, the LFE refines this preliminary output by enhancing localized details and pixel-level features, thereby capturing the interplay between global appearance and local structure. To evaluate the effectiveness of SGLC, we integrated it with the Uformer architecture, a state-of-the-art dehazing model. Experimental results on high-resolution datasets reveal a considerable improvement in peak signal-to-noise ratio (PSNR) when employing SGLC, indicating its potency in addressing haze in large-scale imagery. Moreover, the SGLC design is model-agnostic, allowing any dehazing network to be augmented with the proposed global-and-local feature fusion mechanism. Through this strategy, practitioners can harness both scene-level cues and granular details, significantly improving visual fidelity in high-resolution environments.

CVApr 18, 2020
DriftNet: Aggressive Driving Behavior Classification using 3D EfficientNet Architecture

Alam Noor, Bilel Benjdira, Adel Ammar et al.

Aggressive driving (i.e., car drifting) is a dangerous behavior that puts human safety and life into a significant risk. This behavior is considered as an anomaly concerning the regular traffic in public transportation roads. Recent techniques in deep learning proposed new approaches for anomaly detection in different contexts such as pedestrian monitoring, street fighting, and threat detection. In this paper, we propose a new anomaly detection framework applied to the detection of aggressive driving behavior. Our contribution consists in the development of a 3D neural network architecture, based on the state-of-the-art EfficientNet 2D image classifier, for the aggressive driving detection in videos. We propose an EfficientNet3D CNN feature extractor for video analysis, and we compare it with existing feature extractors. We also created a dataset of car drifting in Saudi Arabian context https://www.youtube.com/watch?v=vLzgye1-d1k . To the best of our knowledge, this is the first work that addresses the problem of aggressive driving behavior using deep learning.

CVNov 11, 2019
Activity Monitoring of Islamic Prayer (Salat) Postures using Deep Learning

Anis Koubaa, Adel Ammar, Bilel Benjdira et al.

In the Muslim community, the prayer (i.e. Salat) is the second pillar of Islam, and it is the most essential and fundamental worshiping activity that believers have to perform five times a day. From a gestures' perspective, there are predefined human postures that must be performed in a precise manner. However, for several people, these postures are not correctly performed, due to being new to Salat or even having learned prayers in an incorrect manner. Furthermore, the time spent in each posture has to be balanced. To address these issues, we propose to develop an artificial intelligence assistive framework that guides worshippers to evaluate the correctness of the postures of their prayers. This paper represents the first step to achieve this objective and addresses the problem of the recognition of the basic gestures of Islamic prayer using Convolutional Neural Networks (CNN). The contribution of this paper lies in building a dataset for the basic Salat positions, and train a YOLOv3 neural network for the recognition of the gestures. Experimental results demonstrate that the mean average precision attains 85% for a training dataset of 764 images of the different postures. To the best of our knowledge, this is the first work that addresses human activity recognition of Salat using deep learning.

CVOct 16, 2019
Aerial Images Processing for Car Detection using Convolutional Neural Networks: Comparison between Faster R-CNN and YoloV3

Adel Ammar, Anis Koubaa, Mohanned Ahmed et al.

In this paper, we address the problem of car detection from aerial images using Convolutional Neural Networks (CNN). This problem presents additional challenges as compared to car (or any object) detection from ground images because features of vehicles from aerial images are more difficult to discern. To investigate this issue, we assess the performance of two state-of-the-art CNN algorithms, namely Faster R-CNN, which is the most popular region-based algorithm, and YOLOv3, which is known to be the fastest detection algorithm. We analyze two datasets with different characteristics to check the impact of various factors, such as UAV's altitude, camera resolution, and object size. A total of 39 training experiments were conducted to account for the effect of different hyperparameter values. The objective of this work is to conduct the most robust and exhaustive comparison between these two cutting-edge algorithms on the specific domain of aerial images. By using a variety of metrics, we show that YOLOv3 yields better performance in most configurations, except that it exhibits a lower recall and less confident detections when object sizes and scales in the testing dataset differ largely from those in the training dataset.

CVMay 8, 2019
Unsupervised Domain Adaptation using Generative Adversarial Networks for Semantic Segmentation of Aerial Images

Bilel Benjdira, Yakoub Bazi, Anis Koubaa et al.

Segmenting aerial images is being of great potential in surveillance and scene understanding of urban areas. It provides a mean for automatic reporting of the different events that happen in inhabited areas. This remarkably promotes public safety and traffic management applications. After the wide adoption of convolutional neural networks methods, the accuracy of semantic segmentation algorithms could easily surpass 80% if a robust dataset is provided. Despite this success, the deployment of a pre-trained segmentation model to survey a new city that is not included in the training set significantly decreases the accuracy. This is due to the domain shift between the source dataset on which the model is trained and the new target domain of the new city images. In this paper, we address this issue and consider the challenge of domain adaptation in semantic segmentation of aerial images. We design an algorithm that reduces the domain shift impact using Generative Adversarial Networks (GANs). In the experiments, we test the proposed methodology on the International Society for Photogrammetry and Remote Sensing (ISPRS) semantic segmentation dataset and found that our method improves the overall accuracy from 35% to 52% when passing from Potsdam domain (considered as source domain) to Vaihingen domain (considered as target domain). In addition, the method allows recovering efficiently the inverted classes due to sensor variation. In particular, it improves the average segmentation accuracy of the inverted classes due to sensor variation from 14% to 61%.

RODec 28, 2018
Car Detection using Unmanned Aerial Vehicles: Comparison between Faster R-CNN and YOLOv3

Bilel Benjdira, Taha Khursheed, Anis Koubaa et al.

Unmanned Aerial Vehicles are increasingly being used in surveillance and traffic monitoring thanks to their high mobility and ability to cover areas at different altitudes and locations. One of the major challenges is to use aerial images to accurately detect cars and count them in real-time for traffic monitoring purposes. Several deep learning techniques were recently proposed based on convolution neural network (CNN) for real-time classification and recognition in computer vision. However, their performance depends on the scenarios where they are used. In this paper, we investigate the performance of two state-of-the-art CNN algorithms, namely Faster R-CNN and YOLOv3, in the context of car detection from aerial images. We trained and tested these two models on a large car dataset taken from UAVs. We demonstrated in this paper that YOLOv3 outperforms Faster R-CNN in sensitivity and processing time, although they are comparable in the precision metric.