Anissa Mokraoui

h-index12

8papers

372citations

Novelty39%

AI Score29

Ranked #144,662 of 194,257 authors (top 74%)#47,446 in CV (top 80%)

8 Papers

7.3CLJul 9

Large-Language-Models-as-a-Judge in Theory-Agnostic Adaptive Metric-Alignment for Prototypical Networks in Personality Recognition

Jing Jie Tan, Ban-Hoe Kwan, Danny Wee-Kiat Ng et al.

Personality recognition has traditionally been constrained by theory-dependent formulations, where models are trained to fit predefined psychological taxonomies rather than uncovering shared underlying behavioral structure. This limits generalization, as personality itself is better understood as theory-invariant, while existing annotations reflect only partial and sometimes inconsistent views of the same latent traits. In this work, we introduce JAM ((J)udge for (A)daptive (M)etric-Alignment), a theory-agnostic framework that shifts learning from adapting to predefined personality theories toward discovering unified latent pseudo-facets that capture shared psychological structure. Rather than constraining the model to any personality taxonomy during training or inference, the framework learns generalizable psychological representations and can infer an individual's latent psychological profile directly from the textual samples, without requiring theory-specific labels. JAM achieves this through an Attention-Pooled Graph Prototypical Network that learns structured representations via clustering in embedding space, together with a Cross-Theory Harmonization (CTH) approach that integrates (i) Human-Guided Linkage and (ii) Machine-Induced Consensus to unify heterogeneous datasets without relying on predefined labels. To further improve robustness and data quality, we incorporate an LLM-as-a-Judge mechanism operating in two configurations, (i) LLM-before-the-loop and (ii) LLM-in-the-loop which identifies ambiguous samples to guide adaptive metric learning. Experiments show that JAM improves cross-framework generalization and performance, establishing a strong step toward theory-agnostic personality inference and supporting low-resource personality theories. The related code repository, model weights, and artifacts are available at https://research.jingjietan.com/JAM

5.0CVJul 17, 2023

Rethinking Intersection Over Union for Small Object Detection in Few-Shot Regime

Pierre Le Jeune, Anissa Mokraoui

In Few-Shot Object Detection (FSOD), detecting small objects is extremely difficult. The limited supervision cripples the localization capabilities of the models and a few pixels shift can dramatically reduce the Intersection over Union (IoU) between the ground truth and predicted boxes for small objects. To this end, we propose Scale-adaptive Intersection over Union (SIoU), a novel box similarity measure. SIoU changes with the objects' size, it is more lenient with small object shifts. We conducted a user study and SIoU better aligns than IoU with human judgment. Employing SIoU as an evaluation criterion helps to build more user-oriented models. SIoU can also be used as a loss function to prioritize small objects during training, outperforming existing loss functions. SIoU improves small object detection in the non-few-shot regime, but this setting is unrealistic in the industry as annotated detection datasets are often too expensive to acquire. Hence, our experiments mainly focus on the few-shot regime to demonstrate the superiority and versatility of SIoU loss. SIoU improves significantly FSOD performance on small objects in both natural (Pascal VOC and COCO datasets) and aerial images (DOTA and DIOR). In aerial imagery, small objects are critical and SIoU loss achieves new state-of-the-art FSOD on DOTA and DIOR.

2.6CVOct 25, 2022Code

A Comparative Attention Framework for Better Few-Shot Object Detection on Aerial Images

Pierre Le Jeune, Anissa Mokraoui

Few-Shot Object Detection (FSOD) methods are mainly designed and evaluated on natural image datasets such as Pascal VOC and MS COCO. However, it is not clear whether the best methods for natural images are also the best for aerial images. Furthermore, direct comparison of performance between FSOD methods is difficult due to the wide variety of detection frameworks and training strategies. Therefore, we propose a benchmarking framework that provides a flexible environment to implement and compare attention-based FSOD methods. The proposed framework focuses on attention mechanisms and is divided into three modules: spatial alignment, global attention, and fusion layer. To remain competitive with existing methods, which often leverage complex training, we propose new augmentation techniques designed for object detection. Using this framework, several FSOD methods are reimplemented and compared. This comparison highlights two distinct performance regimes on aerial and natural images: FSOD performs worse on aerial images. Our experiments suggest that small objects, which are harder to detect in the few-shot setting, account for the poor performance. Finally, we develop a novel multiscale alignment method, Cross-Scales Query-Support Alignment (XQSA) for FSOD, to improve the detection of small objects. XQSA outperforms the state-of-the-art significantly on DOTA and DIOR.

3.7CVSep 13, 2024

Interactive Masked Image Modeling for Multimodal Object Detection in Remote Sensing

Minh-Duc Vu, Zuheng Ming, Fangchen Feng et al.

Object detection in remote sensing imagery plays a vital role in various Earth observation applications. However, unlike object detection in natural scene images, this task is particularly challenging due to the abundance of small, often barely visible objects across diverse terrains. To address these challenges, multimodal learning can be used to integrate features from different data modalities, thereby improving detection accuracy. Nonetheless, the performance of multimodal learning is often constrained by the limited size of labeled datasets. In this paper, we propose to use Masked Image Modeling (MIM) as a pre-training technique, leveraging self-supervised learning on unlabeled data to enhance detection performance. However, conventional MIM such as MAE which uses masked tokens without any contextual information, struggles to capture the fine-grained details due to a lack of interactions with other parts of image. To address this, we propose a new interactive MIM method that can establish interactions between different tokens, which is particularly beneficial for object detection in remote sensing. The extensive ablation studies and evluation demonstrate the effectiveness of our approach.

6.2CVApr 8, 2025Code

Analyzing the Impact of Low-Rank Adaptation for Cross-Domain Few-Shot Object Detection in Aerial Images

Hicham Talaoubrid, Anissa Mokraoui, Ismail Ben Ayed et al.

This paper investigates the application of Low-Rank Adaptation (LoRA) to small models for cross-domain few-shot object detection in aerial images. Originally designed for large-scale models, LoRA helps mitigate overfitting, making it a promising approach for resource-constrained settings. We integrate LoRA into DiffusionDet, and evaluate its performance on the DOTA and DIOR datasets. Our results show that LoRA applied after an initial fine-tuning slightly improves performance in low-shot settings (e.g., 1-shot and 5-shot), while full fine-tuning remains more effective in higher-shot configurations. These findings highlight LoRA's potential for efficient adaptation in aerial object detection, encouraging further research into parameter-efficient fine-tuning strategies for few-shot learning. Our code is available here: https://github.com/HichTala/LoRA-DiffusionDet.

2.6CVJan 6, 2022

A Unified Framework for Attention-Based Few-Shot Object Detection

Pierre Le Jeune, Anissa Mokraoui

Few-Shot Object Detection (FSOD) is a rapidly growing field in computer vision. It consists in finding all occurrences of a given set of classes with only a few annotated examples for each class. Numerous methods have been proposed to address this challenge and most of them are based on attention mechanisms. However, the great variety of classic object detection frameworks and training strategies makes performance comparison between methods difficult. In particular, for attention-based FSOD methods, it is laborious to compare the impact of the different attention mechanisms on performance. This paper aims at filling this shortcoming. To do so, a flexible framework is proposed to allow the implementation of most of the attention techniques available in the literature. To properly introduce such a framework, a detailed review of the existing FSOD methods is firstly provided. Some different attention mechanisms are then reimplemented within the framework and compared with all other parameters fixed.

3.7CVSep 27, 2021

Experience feedback using Representation Learning for Few-Shot Object Detection on Aerial Images

Pierre Le Jeune, Mustapha Lebbah, Anissa Mokraoui et al.

This paper proposes a few-shot method based on Faster R-CNN and representation learning for object detection in aerial images. The two classification branches of Faster R-CNN are replaced by prototypical networks for online adaptation to new classes. These networks produce embeddings vectors for each generated box, which are then compared with class prototypes. The distance between an embedding and a prototype determines the corresponding classification score. The resulting networks are trained in an episodic manner. A new detection task is randomly sampled at each epoch, consisting in detecting only a subset of the classes annotated in the dataset. This training strategy encourages the network to adapt to new classes as it would at test time. In addition, several ideas are explored to improve the proposed method such as a hard negative examples mining strategy and self-supervised clustering for background objects. The performance of our method is assessed on DOTA, a large-scale remote sensing images dataset. The experiments conducted provide a broader understanding of the capabilities of representation learning. It highlights in particular some intrinsic weaknesses for the few-shot object detection task. Finally, some suggestions and perspectives are formulated according to these insights.

1.2MMDec 10, 2020

A User-experience Driven SSIM-Aware Adaptation Approach for DASH Video Streaming

Mustafa Othman, Ken Chen, Anissa Mokraoui

Dynamic Adaptive Streaming over HTTP (DASH) is a video streaming technique largely used. One key point is the adaptation mechanism which resides at the client's side. This mechanism impacts greatly on the overall Quality of Experience (QoE) of the video streaming. In this paper, we propose a new adaptation algorithm for DASH, namely SSIM Based Adaptation (SBA). This mechanism is user-experience driven: it uses the Structural Similarity Index Measurement (SSIM) as main video perceptual quality indicator; moreover, the adaptation is based on a joint consideration of SSIM indicator and the physical resources (buffer occupancy, bandwidth) in order to minimize the buffer starvation (rebuffering) and video quality instability, as well as to maximize the overall video quality (through SSIM). To evaluate the performance of our proposal, we carried out trace-driven emulation with real traffic traces (captured in real mobile network). Comparisons with some representative algorithms (BBA, FESTIVE, OSMF) through major QoE metrics show that our adaptation algorithm SBA achieves an efficient adaptation minimizing both the rebuffering and instability, whereas the displayed video is maintained at a high level of bitrate.