IVNov 2, 2022Code
End-to-end deep multi-score model for No-reference stereoscopic image quality assessmentOussama Messai, Aladine Chetouani
Deep learning-based quality metrics have recently given significant improvement in Image Quality Assessment (IQA). In the field of stereoscopic vision, information is evenly distributed with slight disparity to the left and right eyes. However, due to asymmetric distortion, the objective quality ratings for the left and right images would differ, necessitating the learning of unique quality indicators for each view. Unlike existing stereoscopic IQA measures which focus mainly on estimating a global human score, we suggest incorporating left, right, and stereoscopic objective scores to extract the corresponding properties of each view, and so forth estimating stereoscopic image quality without reference. Therefore, we use a deep multi-score Convolutional Neural Network (CNN). Our model has been trained to perform four tasks: First, predict the left view's quality. Second, predict the quality of the left view. Third and fourth, predict the quality of the stereo view and global quality, respectively, with the global score serving as the ultimate quality. Experiments are conducted on Waterloo IVC 3D Phase 1 and Phase 2 databases. The results obtained show the superiority of our method when comparing with those of the state-of-the-art. The implementation code can be found at: https://github.com/o-messai/multi-score-SIQA
CVNov 4, 2022
PCQA-GRAPHPOINT: Efficients Deep-Based Graph Metric For Point Cloud Quality AssessmentMarouane Tliba, Aladine Chetouani, Giuseppe Valenzise et al.
Following the advent of immersive technologies and the increasing interest in representing interactive geometrical format, 3D Point Clouds (PC) have emerged as a promising solution and effective means to display 3D visual information. In addition to other challenges in immersive applications, objective and subjective quality assessments of compressed 3D content remain open problems and an area of research interest. Yet most of the efforts in the research area ignore the local geometrical structures between points representation. In this paper, we overcome this limitation by introducing a novel and efficient objective metric for Point Clouds Quality Assessment, by learning local intrinsic dependencies using Graph Neural Network (GNN). To evaluate the performance of our method, two well-known datasets have been used. The results demonstrate the effectiveness and reliability of our solution compared to state-of-the-art metrics.
CVApr 13Code
GazeVaLM: A Multi-Observer Eye-Tracking Benchmark for Evaluating Clinical Realism in AI-Generated X-RaysDavid Wong, Zeynep Isik, Bin Wang et al.
We introduce GazeVaLM, a public eye-tracking dataset for studying clinical perception during chest radiograph authenticity assessment. The dataset comprises 960 gaze recordings from 16 expert radiologists interpreting 30 real and 30 synthetic chest X-rays (generated by diffusion based generative AI) under two conditions: diagnostic assessment and real-fake classification (Visual Turing test). For each image-observer pair, we provide raw gaze samples, fixation maps, scanpaths, saliency density maps, structured diagnostic labels, and authenticity judgments. We extend the protocol to 6 state-of-the-art multimodal LLMs, releasing their predicted diagnoses, authenticity labels, and confidence scores under matched conditions - enabling direct human-AI comparison at both decision and uncertainty levels. We further provide analyses of gaze agreement, inter-observer consistency, and benchmarking of radiologists versus LLMs in diagnostic accuracy and authenticity detection. GazeVaLM supports research in gaze modeling, clinical decision-making, human-AI comparison, generative image realism assessment, and uncertainty quantification. By jointly releasing visual attention data, clinical labels, and model predictions, we aim to facilitate reproducible research on how experts and AI systems perceive, interpret, and evaluate medical images. The dataset is available at https://huggingface.co/datasets/davidcwong/GazeVaLM.
IVApr 17, 2023
Transformer with Selective Shuffled Position Embedding and Key-Patch Exchange Strategy for Early Detection of Knee OsteoarthritisZhe Wang, Aladine Chetouani, Mohamed Jarraya et al.
Knee OsteoArthritis (KOA) is a widespread musculoskeletal disorder that can severely impact the mobility of older individuals. Insufficient medical data presents a significant obstacle for effectively training models due to the high cost associated with data labelling. Currently, deep learning-based models extensively utilize data augmentation techniques to improve their generalization ability and alleviate overfitting. However, conventional data augmentation techniques are primarily based on the original data and fail to introduce substantial diversity to the dataset. In this paper, we propose a novel approach based on the Vision Transformer (ViT) model with original Selective Shuffled Position Embedding (SSPE) and key-patch exchange strategies to obtain different input sequences as a method of data augmentation for early detection of KOA (KL-0 vs KL-2). More specifically, we fix and shuffle the position embedding of key and non-key patches, respectively. Then, for the target image, we randomly select other candidate images from the training set to exchange their key patches and thus obtain different input sequences. Finally, a hybrid loss function is developed by incorporating multiple loss functions for different types of the sequences. According to the experimental results, the generated data are considered valid as they lead to a notable improvement in the model's classification performance.
CLNov 14, 2023
Insights into Classifying and Mitigating LLMs' HallucinationsAlessandro Bruno, Pier Luigi Mazzeo, Aladine Chetouani et al.
The widespread adoption of large language models (LLMs) across diverse AI applications is proof of the outstanding achievements obtained in several tasks, such as text mining, text generation, and question answering. However, LLMs are not exempt from drawbacks. One of the most concerning aspects regards the emerging problematic phenomena known as "Hallucinations". They manifest in text generation systems, particularly in question-answering systems reliant on LLMs, potentially resulting in false or misleading information propagation. This paper delves into the underlying causes of AI hallucination and elucidates its significance in artificial intelligence. In particular, Hallucination classification is tackled over several tasks (Machine Translation, Question and Answer, Dialog Systems, Summarisation Systems, Knowledge Graph with LLMs, and Visual Question Answer). Additionally, we explore potential strategies to mitigate hallucinations, aiming to enhance the overall reliability of LLMs. Our research addresses this critical issue within the HeReFaNMi (Health-Related Fake News Mitigation) project, generously supported by NGI Search, dedicated to combating Health-Related Fake News dissemination on the Internet. This endeavour represents a concerted effort to safeguard the integrity of information dissemination in an age of evolving AI technologies.
CVJul 10, 2023
Automatic diagnosis of knee osteoarthritis severity using Swin transformerAymen Sekhri, Marouane Tliba, Mohamed Amine Kerkouri et al.
Knee osteoarthritis (KOA) is a widespread condition that can cause chronic pain and stiffness in the knee joint. Early detection and diagnosis are crucial for successful clinical intervention and management to prevent severe complications, such as loss of mobility. In this paper, we propose an automated approach that employs the Swin Transformer to predict the severity of KOA. Our model uses publicly available radiographic datasets with Kellgren and Lawrence scores to enable early detection and severity assessment. To improve the accuracy of our model, we employ a multi-prediction head architecture that utilizes multi-layer perceptron classifiers. Additionally, we introduce a novel training approach that reduces the data drift between multiple datasets to ensure the generalization ability of the model. The results of our experiments demonstrate the effectiveness and feasibility of our approach in predicting KOA severity accurately.
CVMay 14Code
CT-DegradBench: A Physics-Informed Benchmark for CT Degradation Detection and Severity EstimationYousra Nabila Taifour, Marouane Tliba, Zuheng Ming et al.
Computed tomography (CT) images are frequently degraded by acquisition artifacts, including noise, blur, streaking, aliasing, and metal artifacts. Yet CT enhancement is still largely evaluated using image quality metrics with limited perceptual and clinical validity, while existing datasets remain focused on isolated restoration tasks, hindering unified benchmarking across diverse degradation types. We present CT-DegradBench, a dataset and benchmark for CT degradation detection and severity estimation under controlled single- and mixed-artifact settings. CT-DegradBench enables systematic evaluation across multiple degradation families and severity levels within a common experimental framework. We further propose SeSpeCT (Semantic-Spectral CT degradation estimation), a framework that combines semantic priors from medical vision-language models with complementary frequency-domain cues for artifact analysis. SeSpeCT constructs a training-free semantic quality axis in the multimodal embedding space using radiology-informed text prompts, without task-specific fine-tuning, and combines it with spectral features that capture degradation-specific frequency patterns. The resulting representation enables joint prediction of artifact type and severity. Experimental results show that SeSpeCT consistently outperforms the evaluated baselines under both single- and mixed-degradation settings. The framework is available at https://github.com/yousranb/CT-DEGRADBENCH.
CVMar 15, 2023
Quality evaluation of point clouds: a novel no-reference approach using transformer-based architectureMarouane Tliba, Aladine Chetouani, Giuseppe Valenzise et al.
With the increased interest in immersive experiences, point cloud came to birth and was widely adopted as the first choice to represent 3D media. Besides several distortions that could affect the 3D content spanning from acquisition to rendering, efficient transmission of such volumetric content over traditional communication systems stands at the expense of the delivered perceptual quality. To estimate the magnitude of such degradation, employing quality metrics became an inevitable solution. In this work, we propose a novel deep-based no-reference quality metric that operates directly on the whole point cloud without requiring extensive pre-processing, enabling real-time evaluation over both transmission and rendering levels. To do so, we use a novel model design consisting primarily of cross and self-attention layers, in order to learn the best set of local semantic affinities while keeping the best combination of geometry and color information in multiple levels from basic features extraction to deep representation modeling.
IVOct 19, 2022
Deep-based quality assessment of medical images through domain adaptationMarouane Tliba, Aymen Sekhri, Mohamed Amine Kerkouri et al.
Predicting the quality of multimedia content is often needed in different fields. In some applications, quality metrics are crucial with a high impact, and can affect decision making such as diagnosis from medical multimedia. In this paper, we focus on such applications by proposing an efficient and shallow model for predicting the quality of medical images without reference from a small amount of annotated data. Our model is based on convolution self-attention that aims to model complex representation from relevant local characteristics of images, which itself slide over the image to interpolate the global quality score. We also apply domain adaptation learning in unsupervised and semi-supervised manner. The proposed model is evaluated through a dataset composed of several images and their corresponding subjective scores. The obtained results showed the efficiency of the proposed method, but also, the relevance of the applying domain adaptation to generalize over different multimedia domains regarding the downstream task of perceptual quality prediction. \footnote{Funded by the TIC-ART project, Regional fund (Region Centre-Val de Loire)}
CVSep 22, 2022
A domain adaptive deep learning solution for scanpath prediction of paintingsMohamed Amine Kerkouri, Marouane Tliba, Aladine Chetouani et al.
Cultural heritage understanding and preservation is an important issue for society as it represents a fundamental aspect of its identity. Paintings represent a significant part of cultural heritage, and are the subject of study continuously. However, the way viewers perceive paintings is strictly related to the so-called HVS (Human Vision System) behaviour. This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings. In further details, we introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans, including the fundamental understanding of a scene, and then extend it to painting images. The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention. We use an FCNN (Fully Convolutional Neural Network), in which we exploit a differentiable channel-wise selection and Soft-Argmax modules. We also incorporate learnable Gaussian distributions onto the network bottleneck to simulate visual attention process bias in natural scene images. Furthermore, to reduce the effect of shifts between different domains (i.e. natural images, painting), we urge the model to learn unsupervised general features from other domains using a gradient reversal classifier. The results obtained by our model outperform existing state-of-the-art ones in terms of accuracy and efficiency.
IVMar 23, 2023
Confidence-Driven Deep Learning Framework for Early Detection of Knee OsteoarthritisZhe Wang, Aladine Chetouani, Yung Hsin Chen et al.
Knee Osteoarthritis (KOA) is a prevalent musculoskeletal disorder that severely impacts mobility and quality of life, particularly among older adults. Its diagnosis often relies on subjective assessments using the Kellgren-Lawrence (KL) grading system, leading to variability in clinical evaluations. To address these challenges, we propose a confidence-driven deep learning framework for early KOA detection, focusing on distinguishing KL-0 and KL-2 stages. The Siamese-based framework integrates a novel multi-level feature extraction architecture with a hybrid loss strategy. Specifically, multi-level Global Average Pooling (GAP) layers are employed to extract features from varying network depths, ensuring comprehensive feature representation, while the hybrid loss strategy partitions training samples into high-, medium-, and low-confidence subsets. Tailored loss functions are applied to improve model robustness and effectively handle uncertainty in annotations. Experimental results on the Osteoarthritis Initiative (OAI) dataset demonstrate that the proposed framework achieves competitive accuracy, sensitivity, and specificity, comparable to those of expert radiologists. Cohen's kappa values (k > 0.85)) confirm substantial agreement, while McNemar's test (p > 0.05) indicates no statistically significant differences between the model and radiologists. Additionally, Confidence distribution analysis reveals that the model emulates radiologists' decision-making patterns. These findings highlight the potential of the proposed approach to serve as an auxiliary diagnostic tool, enhancing early KOA detection and reducing clinical workload.
CVFeb 20, 2023
Kernel function impact on convolutional neural networksM. Amine Mahmoudi, Aladine Chetouani, Fatma Boufera et al.
This paper investigates the usage of kernel functions at the different layers in a convolutional neural network. We carry out extensive studies of their impact on convolutional, pooling and fully-connected layers. We notice that the linear kernel may not be sufficiently effective to fit the input data distributions, whereas high order kernels prone to over-fitting. This leads to conclude that a trade-off between complexity and performance should be reached. We show how one can effectively leverage kernel functions, by introducing a more distortion aware pooling layers which reduces over-fitting while keeping track of the majority of the information fed into subsequent layers. We further propose Kernelized Dense Layers (KDL), which replace fully-connected layers, and capture higher order feature interactions. The experiments on conventional classification datasets i.e. MNIST, FASHION-MNIST and CIFAR-10, show that the proposed techniques improve the performance of the network compared to classical convolution, pooling and fully connected layers. Moreover, experiments on fine-grained classification i.e. facial expression databases, namely RAF-DB, FER2013 and ExpW demonstrate that the discriminative power of the network is boosted, since the proposed techniques improve the awareness to slight visual details and allows the network reaching state-of-the-art results.
CVNov 14, 2022
An Inter-observer consistent deep adversarial training for visual scanpath predictionMohamed Amine Kerkouri, Marouane Tliba, Aladine Chetouani et al.
The visual scanpath is a sequence of points through which the human gaze moves while exploring a scene. It represents the fundamental concepts upon which visual attention research is based. As a result, the ability to predict them has emerged as an important task in recent years. In this paper, we propose an inter-observer consistent adversarial training approach for scanpath prediction through a lightweight deep neural network. The adversarial method employs a discriminative neural network as a dynamic loss that is better suited to model the natural stochastic phenomenon while maintaining consistency between the distributions related to the subjective nature of scanpaths traversed by different observers. Through extensive testing, we show the competitiveness of our approach in regard to state-of-the-art methods.
IVAug 1, 2024
Temporal Evolution of Knee Osteoarthritis: A Diffusion-based Morphing Model for X-ray Medical Image SynthesisZhe Wang, Aladine Chetouani, Rachid Jennane et al.
Knee Osteoarthritis (KOA) is a common musculoskeletal disorder that significantly affects the mobility of older adults. In the medical domain, images containing temporal data are frequently utilized to study temporal dynamics and statistically monitor disease progression. While deep learning-based generative models for natural images have been widely researched, there are comparatively few methods available for synthesizing temporal knee X-rays. In this work, we introduce a novel deep-learning model designed to synthesize intermediate X-ray images between a specific patient's healthy knee and severe KOA stages. During the testing phase, based on a healthy knee X-ray, the proposed model can produce a continuous and effective sequence of KOA X-ray images with varying degrees of severity. Specifically, we introduce a Diffusion-based Morphing Model by modifying the Denoising Diffusion Probabilistic Model. Our approach integrates diffusion and morphing modules, enabling the model to capture spatial morphing details between source and target knee X-ray images and synthesize intermediate frames along a geodesic path. A hybrid loss consisting of diffusion loss, morphing loss, and supervision loss was employed. We demonstrate that our proposed approach achieves the highest temporal frame synthesis performance, effectively augmenting data for classification models and simulating the progression of KOA.
CVMar 15, 2024Code
Shifting Focus: From Global Semantics to Local Prominent Features in Swin-Transformer for Knee Osteoarthritis Severity AssessmentAymen Sekhri, Marouane Tliba, Mohamed Amine Kerkouri et al.
Conventional imaging diagnostics frequently encounter bottlenecks due to manual inspection, which can lead to delays and inconsistencies. Although deep learning offers a pathway to automation and enhanced accuracy, foundational models in computer vision often emphasize global context at the expense of local details, which are vital for medical imaging diagnostics. To address this, we harness the Swin Transformer's capacity to discern extended spatial dependencies within images through the hierarchical framework. Our novel contribution lies in refining local feature representations, orienting them specifically toward the final distribution of the classifier. This method ensures that local features are not only preserved but are also enriched with task-specific information, enhancing their relevance and detail at every hierarchical level. By implementing this strategy, our model demonstrates significant robustness and precision, as evidenced by extensive validation of two established benchmarks for Knee OsteoArthritis (KOA) grade classification. These results highlight our approach's effectiveness and its promising implications for the future of medical imaging diagnostics. Our implementation is available on https://github.com/mtliba/KOA_NLCS2024
CVFeb 25
SPGen: Stochastic scanpath generation for paintings using unsupervised domain adaptationMohamed Amine Kerkouri, Marouane Tliba, Aladine Chetouani et al.
Understanding human visual attention is key to preserving cultural heritage We introduce SPGen a novel deep learning model to predict scanpaths the sequence of eye movementswhen viewers observe paintings. Our architecture uses a Fully Convolutional Neural Network FCNN with differentiable fixation selection and learnable Gaussian priors to simulate natural viewing biases To address the domain gap between photographs and artworks we employ unsupervised domain adaptation via a gradient reversal layer allowing the model to transfer knowledge from natural scenes to paintings Furthermore a random noise sampler models the inherent stochasticity of eyetracking data. Extensive testing shows SPGen outperforms existing methods offering a powerful tool to analyze gaze behavior and advance the preservation and appreciation of artistic treasures.
IVFeb 26, 2023
Key-Exchange Convolutional Auto-Encoder for Data Augmentation in Early Knee Osteoarthritis DetectionZhe Wang, Aladine Chetouani, Mohamed Jarraya et al.
Knee Osteoarthritis (KOA) is a common musculoskeletal condition that significantly affects mobility and quality of life, particularly in elderly populations. However, training deep learning models for early KOA classification is often hampered by the limited availability of annotated medical datasets, owing to the high costs and labour-intensive nature of data labelling. Traditional data augmentation techniques, while useful, rely on simple transformations and fail to introduce sufficient diversity into the dataset. To address these challenges, we propose the Key-Exchange Convolutional Auto-Encoder (KECAE) as an innovative Artificial Intelligence (AI)-based data augmentation strategy for early KOA classification. Our model employs a convolutional autoencoder with a novel key-exchange mechanism that generates synthetic images by selectively exchanging key pathological features between X-ray images, which not only diversifies the dataset but also ensures the clinical validity of the augmented data. A hybrid loss function is introduced to supervise feature learning and reconstruction, integrating multiple components, including reconstruction, supervision, and feature separation losses. Experimental results demonstrate that the KECAE-generated data significantly improve the performance of KOA classification models, with accuracy gains of up to 1.98% across various standard and state-of-the-art architectures. Furthermore, a clinical validation study involving expert radiologists confirms the anatomical plausibility and diagnostic realism of the synthetic outputs. These findings highlight the potential of KECAE as a robust tool for augmenting medical datasets in early KOA detection.
IVJun 18, 2025Code
Diffusion-based Counterfactual Augmentation: Towards Robust and Interpretable Knee Osteoarthritis GradingZhe Wang, Yuhua Ru, Aladine Chetouani et al.
Automated grading of Knee Osteoarthritis (KOA) from radiographs is challenged by significant inter-observer variability and the limited robustness of deep learning models, particularly near critical decision boundaries. To address these limitations, this paper proposes a novel framework, Diffusion-based Counterfactual Augmentation (DCA), which enhances model robustness and interpretability by generating targeted counterfactual examples. The method navigates the latent space of a diffusion model using a Stochastic Differential Equation (SDE), governed by balancing a classifier-informed boundary drive with a manifold constraint. The resulting counterfactuals are then used within a self-corrective learning strategy to improve the classifier by focusing on its specific areas of uncertainty. Extensive experiments on the public Osteoarthritis Initiative (OAI) and Multicenter Osteoarthritis Study (MOST) datasets demonstrate that this approach significantly improves classification accuracy across multiple model architectures. Furthermore, the method provides interpretability by visualizing minimal pathological changes and revealing that the learned latent space topology aligns with clinical knowledge of KOA progression. The DCA framework effectively converts model uncertainty into a robust training signal, offering a promising pathway to developing more accurate and trustworthy automated diagnostic systems. Our code is available at https://github.com/ZWang78/DCA.
IVApr 9, 2025Code
MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-ResolutionZhe Wang, Yuhua Ru, Aladine Chetouani et al.
Magnetic Resonance Imaging (MRI) at lower field strengths (e.g., 3T) suffers from limited spatial resolution, making it challenging to capture fine anatomical details essential for clinical diagnosis and neuroimaging research. To overcome this limitation, we propose MoEDiff-SR, a Mixture of Experts (MoE)-guided diffusion model for region-adaptive MRI Super-Resolution (SR). Unlike conventional diffusion-based SR models that apply a uniform denoising process across the entire image, MoEDiff-SR dynamically selects specialized denoising experts at a fine-grained token level, ensuring region-specific adaptation and enhanced SR performance. Specifically, our approach first employs a Transformer-based feature extractor to compute multi-scale patch embeddings, capturing both global structural information and local texture details. The extracted feature embeddings are then fed into an MoE gating network, which assigns adaptive weights to multiple diffusion-based denoisers, each specializing in different brain MRI characteristics, such as centrum semiovale, sulcal and gyral cortex, and grey-white matter junction. The final output is produced by aggregating the denoised results from these specialized experts according to dynamically assigned gating probabilities. Experimental results demonstrate that MoEDiff-SR outperforms existing state-of-the-art methods in terms of quantitative image quality metrics, perceptual fidelity, and computational efficiency. Difference maps from each expert further highlight their distinct specializations, confirming the effective region-specific denoising capability and the interpretability of expert contributions. Additionally, clinical evaluation validates its superior diagnostic capability in identifying subtle pathological features, emphasizing its practical relevance in clinical neuroimaging. Our code is available at https://github.com/ZWang78/MoEDiff-SR.
IVJan 30, 2025Code
Distillation-Driven Diffusion Model for Multi-Scale MRI Super-Resolution: Make 1.5T MRI Great AgainZhe Wang, Yuhua Ru, Fabian Bauer et al.
Magnetic Resonance Imaging (MRI) offers critical insights into microstructural details, however, the spatial resolution of standard 1.5T imaging systems is often limited. In contrast, 7T MRI provides significantly enhanced spatial resolution, enabling finer visualization of anatomical structures. Though this, the high cost and limited availability of 7T MRI hinder its widespread use in clinical settings. To address this challenge, a novel Super-Resolution (SR) model is proposed to generate 7T-like MRI from standard 1.5T MRI scans. Our approach leverages a diffusion-based architecture, incorporating gradient nonlinearity correction and bias field correction data from 7T imaging as guidance. Moreover, to improve deployability, a progressive distillation strategy is introduced. Specifically, the student model refines the 7T SR task with steps, leveraging feature maps from the inference phase of the teacher model as guidance, aiming to allow the student model to achieve progressively 7T SR performance with a smaller, deployable model size. Experimental results demonstrate that our baseline teacher model achieves state-of-the-art SR performance. The student model, while lightweight, sacrifices minimal performance. Furthermore, the student model is capable of accepting MRI inputs at varying resolutions without the need for retraining, significantly further enhancing deployment flexibility. The clinical relevance of our proposed method is validated using clinical data from Massachusetts General Hospital. Our code is available at https://github.com/ZWang78/SR.
IVJun 17, 2021Code
A Multi-task convolutional neural network for blind stereoscopic image quality assessment using naturalness analysisSalima Bourbia, Ayoub Karine, Aladine Chetouani et al.
This paper addresses the problem of blind stereoscopic image quality assessment (NR-SIQA) using a new multi-task deep learning based-method. In the field of stereoscopic vision, the information is fairly distributed between the left and right views as well as the binocular phenomenon. In this work, we propose to integrate these characteristics to estimate the quality of stereoscopic images without reference through a convolutional neural network. Our method is based on two main tasks: the first task predicts naturalness analysis based features adapted to stereo images, while the second task predicts the quality of such images. The former, so-called auxiliary task, aims to find more robust and relevant features to improve the quality prediction. To do this, we compute naturalness-based features using a Natural Scene Statistics (NSS) model in the complex wavelet domain. It allows to capture the statistical dependency between pairs of the stereoscopic images. Experiments are conducted on the well known LIVE PHASE I and LIVE PHASE II databases. The results obtained show the relevance of our method when comparing with those of the state-of-the-art. Our code is available online on https://github.com/Bourbia-Salima/multitask-cnn-nrsiqa_2021.
CVFeb 25, 2021Code
A deep perceptual metric for 3D point cloudsMaurice Quach, Aladine Chetouani, Giuseppe Valenzise et al.
Point clouds are essential for storage and transmission of 3D content. As they can entail significant volumes of data, point cloud compression is crucial for practical usage. Recently, point cloud geometry compression approaches based on deep neural networks have been explored. In this paper, we evaluate the ability to predict perceptual quality of typical voxel-based loss functions employed to train these networks. We find that the commonly used focal loss and weighted binary cross entropy are poorly correlated with human perception. We thus propose a perceptual loss function for 3D point clouds which outperforms existing loss functions on the ICIP2020 subjective dataset. In addition, we propose a novel truncated distance field voxel grid representation and find that it leads to sparser latent spaces and loss functions that are more correlated with perceived visual quality compared to a binary representation. The source code is available at https://github.com/mauriceqch/2021_pc_perceptual_loss.
LGMay 15, 2024
Enhancing Maritime Trajectory Forecasting via H3 Index and Causal Language Modelling (CLM)Nicolas Drapier, Aladine Chetouani, Aurélien Chateigner
The prediction of ship trajectories is a growing field of study in artificial intelligence. Traditional methods rely on the use of LSTM, GRU networks, and even Transformer architectures for the prediction of spatio-temporal series. This study proposes a viable alternative for predicting these trajectories using only GNSS positions. It considers this spatio-temporal problem as a natural language processing problem. The latitude/longitude coordinates of AIS messages are transformed into cell identifiers using the H3 index. Thanks to the pseudo-octal representation, it becomes easier for language models to learn the spatial hierarchy of the H3 index. The method is compared with a classical Kalman filter, widely used in the maritime domain, and introduces the Fréchet distance as the main evaluation metric. We show that it is possible to predict ship trajectories quite precisely up to 8 hours ahead with 30 minutes of context, using solely GNSS positions, without relying on any additional information such as speed, course, or external conditions - unlike many traditional methods. We demonstrate that this alternative works well enough to predict trajectories worldwide.
CVApr 9
What They Saw, Not Just Where They Looked: Semantic Scanpath Similarity via VLMs and NLP metricMohamed Amine Kerkouri, Marouane Tliba, Bin Wang et al.
Scanpath similarity metrics are central to eye-movement research, yet existing methods predominantly evaluate spatial and temporal alignment while neglecting semantic equivalence between attended image regions. We present a semantic scanpath similarity framework that integrates vision-language models (VLMs) into eye-tracking analysis. Each fixation is encoded under controlled visual context (patch-based and marker-based strategies) and transformed into concise textual descriptions, which are aggregated into scanpath-level representations. Semantic similarity is then computed using embedding-based and lexical NLP metrics and compared against established spatial measures, including MultiMatch and DTW. Experiments on free-viewing eye-tracking data demonstrate that semantic similarity captures partially independent variance from geometric alignment, revealing cases of high content agreement despite spatial divergence. We further analyze the impact of contextual encoding on description fidelity and metric stability. Our findings suggest that multimodal foundation models enable interpretable, content-aware extensions of classical scanpath analysis, providing a complementary dimension for gaze research within the ETRA community.
CVJul 15, 2025
Combining Transformers and CNNs for Efficient Object Detection in High-Resolution Satellite ImageryNicolas Drapier, Aladine Chetouani, Aurélien Chateigner
We present GLOD, a transformer-first architecture for object detection in high-resolution satellite imagery. GLOD replaces CNN backbones with a Swin Transformer for end-to-end feature extraction, combined with novel UpConvMixer blocks for robust upsampling and Fusion Blocks for multi-scale feature integration. Our approach achieves 32.95\% on xView, outperforming SOTA methods by 11.46\%. Key innovations include asymmetric fusion with CBAM attention and a multi-path head design capturing objects across scales. The architecture is optimized for satellite imagery challenges, leveraging spatial priors while maintaining computational efficiency.
CVApr 21, 2025
Shifts in Doctors' Eye Movements Between Real and AI-Generated Medical ImagesDavid C Wong, Bin Wang, Gorkem Durak et al.
Eye-tracking analysis plays a vital role in medical imaging, providing key insights into how radiologists visually interpret and diagnose clinical cases. In this work, we first analyze radiologists' attention and agreement by measuring the distribution of various eye-movement patterns, including saccades direction, amplitude, and their joint distribution. These metrics help uncover patterns in attention allocation and diagnostic strategies. Furthermore, we investigate whether and how doctors' gaze behavior shifts when viewing authentic (Real) versus deep-learning-generated (Fake) images. To achieve this, we examine fixation bias maps, focusing on first, last, short, and longest fixations independently, along with detailed saccades patterns, to quantify differences in gaze distribution and visual saliency between authentic and synthetic images.
CVMar 15, 2024
Quantization Effects on Neural Networks Perception: How would quantization change the perceptual field of vision models?Mohamed Amine Kerkouri, Marouane Tliba, Aladine Chetouani et al.
Neural network quantization is a critical technique for deploying models on resource-limited devices. Despite its widespread use, the impact of quantization on model perceptual fields, particularly in relation to class activation maps (CAMs), remains underexplored. This study investigates how quantization influences the spatial recognition abilities of vision models by examining the alignment between CAMs and visual salient objects maps across various architectures. Utilizing a dataset of 10,000 images from ImageNet, we conduct a comprehensive evaluation of six diverse CNN architectures: VGG16, ResNet50, EfficientNet, MobileNet, SqueezeNet, and DenseNet. Through the systematic application of quantization techniques, we identify subtle changes in CAMs and their alignment with Salient object maps. Our results demonstrate the differing sensitivities of these architectures to quantization and highlight its implications for model performance and interpretability in real-world applications. This work primarily contributes to a deeper understanding of neural network quantization, offering insights essential for deploying efficient and interpretable models in practical settings.
CVOct 20, 2025
Morphology-Aware KOA Classification: Integrating Graph Priors with Vision ModelsMarouane Tliba, Mohamed Amine Kerkouri, Yassine Nasser et al.
Knee osteoarthritis (KOA) diagnosis from radiographs remains challenging due to the subtle morphological details that standard deep learning models struggle to capture effectively. We propose a novel multimodal framework that combines anatomical structure with radiographic features by integrating a morphological graph representation - derived from Segment Anything Model (SAM) segmentations - with a vision encoder. Our approach enforces alignment between geometry-informed graph embeddings and radiographic features through mutual information maximization, significantly improving KOA classification accuracy. By constructing graphs from anatomical features, we introduce explicit morphological priors that mirror clinical assessment criteria, enriching the feature space and enhancing the model's inductive bias. Experiments on the Osteoarthritis Initiative dataset demonstrate that our approach surpasses single-modality baselines by up to 10\% in accuracy (reaching nearly 80\%), while outperforming existing state-of-the-art methods by 8\% in accuracy and 11\% in F1 score. These results underscore the critical importance of incorporating anatomical structure into radiographic analysis for accurate KOA severity grading.
CVMar 26, 2025
Eyes Tell the Truth: GazeVal Highlights Shortcomings of Generative AI in Medical ImagingDavid Wong, Bin Wang, Gorkem Durak et al.
The demand for high-quality synthetic data for model training and augmentation has never been greater in medical imaging. However, current evaluations predominantly rely on computational metrics that fail to align with human expert recognition. This leads to synthetic images that may appear realistic numerically but lack clinical authenticity, posing significant challenges in ensuring the reliability and effectiveness of AI-driven medical tools. To address this gap, we introduce GazeVal, a practical framework that synergizes expert eye-tracking data with direct radiological evaluations to assess the quality of synthetic medical images. GazeVal leverages gaze patterns of radiologists as they provide a deeper understanding of how experts perceive and interact with synthetic data in different tasks (i.e., diagnostic or Turing tests). Experiments with sixteen radiologists revealed that 96.6% of the generated images (by the most recent state-of-the-art AI algorithm) were identified as fake, demonstrating the limitations of generative AI in producing clinically accurate images.
IVMar 16, 2025
Feasibility study for reconstruction of knee MRI from one corresponding X-ray via CNNZhe Wang, Aladine Chetouani, Rachid Jennane
Generally, X-ray, as an inexpensive and popular medical imaging technique, is widely chosen by medical practitioners. With the development of medical technology, Magnetic Resonance Imaging (MRI), an advanced medical imaging technique, has already become a supplementary diagnostic option for the diagnosis of KOA. We propose in this paper a deep-learning-based approach for generating MRI from one corresponding X-ray. Our method uses the hidden variables of a Convolutional Auto-Encoder (CAE) model, trained for reconstructing X-ray image, as inputs of a generator model to provide 3D MRI.
CVJan 1, 2022
SalyPath360: Saliency and Scanpath Prediction Framework for Omnidirectional ImagesMohamed Amine Kerkouri, Marouane Tliba, Aladine Chetouani et al.
This paper introduces a new framework to predict visual attention of omnidirectional images. The key setup of our architecture is the simultaneous prediction of the saliency map and a corresponding scanpath for a given stimulus. The framework implements a fully encoder-decoder convolutional neural network augmented by an attention module to generate representative saliency maps. In addition, an auxiliary network is employed to generate probable viewport center fixation points through the SoftArgMax function. The latter allows to derive fixation points from feature maps. To take advantage of the scanpath prediction, an adaptive joint probability distribution model is then applied to construct the final unbiased saliency map by leveraging the encoder decoder-based saliency map and the scanpath-based saliency heatmap. The proposed framework was evaluated in terms of saliency and scanpath prediction, and the results were compared to state-of-the-art methods on Salient360! dataset. The results showed the relevance of our framework and the benefits of such architecture for further omnidirectional visual attention prediction tasks.
CVDec 8, 2021
A Simple and efficient deep Scanpath PredictionMohamed Amine Kerkouri, Aladine Chetouani
Visual scanpath is the sequence of fixation points that the human gaze travels while observing an image, and its prediction helps in modeling the visual attention of an image. To this end several models were proposed in the literature using complex deep learning architectures and frameworks. Here, we explore the efficiency of using common deep learning architectures, in a simple fully convolutional regressive manner. We experiment how well these models can predict the scanpaths on 2 datasets. We compare with other models using different metrics and show competitive results that sometimes surpass previous complex architectures. We also compare the different leveraged backbone architectures based on their performances on the experiment to deduce which ones are the most suitable for the task.
CVSep 24, 2021
Learnable Triangulation for Deep Learning-based 3D Reconstruction of Objects of Arbitrary Topology from Single RGB ImagesTarek Ben Charrada, Hedi Tabia, Aladine Chetouani et al.
We propose a novel deep reinforcement learning-based approach for 3D object reconstruction from monocular images. Prior works that use mesh representations are template based. Thus, they are limited to the reconstruction of objects that have the same topology as the template. Methods that use volumetric grids as intermediate representations are computationally expensive, which limits their application in real-time scenarios. In this paper, we propose a novel end-to-end method that reconstructs 3D objects of arbitrary topology from a monocular image. It is composed of of (1) a Vertex Generation Network (VGN), which predicts the initial 3D locations of the object's vertices from an input RGB image, (2) a differentiable triangulation layer, which learns in a non-supervised manner, using a novel reinforcement learning algorithm, the best triangulation of the object's vertices, and finally, (3) a hierarchical mesh refinement network that uses graph convolutions to refine the initial mesh. Our key contribution is the learnable triangulation process, which recovers in an unsupervised manner the topology of the input shape. Our experiments on ShapeNet and Pix3D benchmarks show that the proposed method outperforms the state-of-the-art in terms of visual quality, reconstruction accuracy, and computational time.
CVJun 29, 2021
SALYPATH: A Deep-Based Architecture for visual attention predictionMohamed Amine Kerkouri, Marouane Tliba, Aladine Chetouani et al.
Human vision is naturally more attracted by some regions within their field of view than others. This intrinsic selectivity mechanism, so-called visual attention, is influenced by both high- and low-level factors; such as the global environment (illumination, background texture, etc.), stimulus characteristics (color, intensity, orientation, etc.), and some prior visual information. Visual attention is useful for many computer vision applications such as image compression, recognition, and captioning. In this paper, we propose an end-to-end deep-based method, so-called SALYPATH (SALiencY and scanPATH), that efficiently predicts the scanpath of an image through features of a saliency model. The idea is predict the scanpath by exploiting the capacity of a deep-based model to predict the saliency. The proposed method was evaluated through 2 well-known datasets. The results obtained showed the relevance of the proposed framework comparing to state-of-the-art models.
CVSep 22, 2020
Kernelized dense layers for facial expression recognitionM. Amine Mahmoudi, Aladine Chetouani, Fatma Boufera et al.
Fully connected layer is an essential component of Convolutional Neural Networks (CNNs), which demonstrates its efficiency in computer vision tasks. The CNN process usually starts with convolution and pooling layers that first break down the input images into features, and then analyze them independently. The result of this process feeds into a fully connected neural network structure which drives the final classification decision. In this paper, we propose a Kernelized Dense Layer (KDL) which captures higher order feature interactions instead of conventional linear relations. We apply this method to Facial Expression Recognition (FER) and evaluate its performance on RAF, FER2013 and ExpW datasets. The experimental results demonstrate the benefits of such layer and show that our model achieves competitive results with respect to the state-of-the-art approaches.
MMOct 28, 2019
Blind Robust 3-D Mesh Watermarking based on Mesh Saliency and QIM quantization for Copyright ProtectionMohamed Hamidi, Aladine Chetouani, Mohamed El Haziti et al.
Due to the recent demand of 3-D models in several applications like medical imaging, video games, among others, the necessity of implementing 3-D mesh watermarking schemes aiming to protect copyright has increased considerably. The majority of robust 3-D watermarking techniques have essentially focused on the robustness against attacks while the imperceptibility of these techniques is still a real issue. In this context, a blind robust 3-D mesh watermarking method based on mesh saliency and Quantization Index Modulation (QIM) for Copyright protection is proposed. The watermark is embedded by quantifying the vertex norms of the 3-D mesh using QIM scheme since it offers a good robustness-capacity tradeoff. The choice of the vertices is adjusted by the mesh saliency to achieve watermark robustness and to avoid visual distortions. The experimental results show the high imperceptibility of the proposed scheme while ensuring a good robustness against a wide range of attacks including additive noise, similarity transformations, smoothing, quantization, etc.
CROct 24, 2019
A Robust Blind 3-D Mesh Watermarking technique based on SCS quantization and mesh Saliency for Copyright ProtectionMohamed Hamidi, Aladine Chetouani, Mohamed El Haziti1 et al.
Due to the recent demand of 3-D meshes in a wide range of applications such as video games, medical imaging, film special effect making, computer-aided design (CAD), among others, the necessity of implementing 3-D mesh watermarking schemes aiming to protect copyright has increased in the last decade. Nowadays, the majority of robust 3-D watermarking approaches have mainly focused on the robustness against attacks while the imperceptibility of these techniques is still a serious challenge. In this context, a blind robust 3-D mesh watermarking method based on mesh saliency and scalar Costa scheme (SCS) for Copyright protection is proposed. The watermark is embedded by quantifying the vertex norms of the 3-D mesh by SCS scheme using the vertex normal norms as synchronizing primitives. The choice of these vertices is based on 3-D mesh saliency to achieve watermark robustness while ensuring high imperceptibility. The experimental results show that in comparison with the alternative methods, the proposed work can achieve a high imperceptibility performance while ensuring a good robustness against several common attacks including similarity transformations, noise addition, quantization, smoothing, elements reordering, etc.