Engelbert Mephu Nguifo

h-index27

16papers

61citations

Novelty38%

AI Score45

Ranked #69,487 of 201,326 authors (top 35%)#15,720 in LG (top 37%)

16 Papers

CVMar 1Code

Multi-Level Bidirectional Decoder Interaction for Uncertainty-Aware Breast Ultrasound Analysis

Abdullah Al Shafi, Md Kawsar Mahmud Khan Zunayed, Safin Ahmmed et al.

Breast ultrasound interpretation requires simultaneous lesion segmentation and tissue classification. However, conventional multi-task learning approaches suffer from task interference and rigid coordination strategies that fail to adapt to instance-specific prediction difficulty. We propose a multi-task framework addressing these limitations through multi-level decoder interaction and uncertainty-aware adaptive coordination. Task Interaction Modules operate at all decoder levels, establishing bidirectional segmentation-classification communication during spatial reconstruction through attention weighted pooling and multiplicative modulation. Unlike prior single-level or encoder-only approaches, this multi-level design captures scale specific task synergies across semantic-to-spatial scales, producing complementary task interaction streams. Uncertainty-Proxy Attention adaptively weights base versus enhanced features at each level using feature activation variance, enabling per-level and per-sample task balancing without heuristic tuning. To support instance-adaptive prediction, multi-scale context fusion captures morphological cues across varying lesion sizes. Evaluation on multiple publicly available breast ultrasound datasets demonstrates competitive performance, including 74.5% lesion IoU and 90.6% classification accuracy on BUSI dataset. Ablation studies confirm that multi-level task interaction provides significant performance gains, validating that decoder-level bidirectional communication is more effective than conventional encoder-only parameter sharing. The code is available at: https://github.com/C-loud-Nine/Uncertainty-Aware-Multi-Level-Decoder-Interaction.

CVMay 30

DASH: Dual-Branch Score Distillation for Guidance-Calibrated Compact Diffusion Models

Abdullah Al Shafi, Kazi Saeed Alam, Sk Imran Hossain et al.

Parameter compression of class-conditional diffusion models reveals an underexplored limitation in output-level distillation: the unconditional score branch remains unsupervised, leaving the classifier-free guidance gap underdetermined in the student. This gap, amplified at every denoising step, admits degenerate solutions where both branches collapse toward identical predictions, rendering guidance ineffective despite low output-level training loss. This paper introduces DASH, a dual-branch distillation framework that independently supervises both score branches, uniquely specifying target branch outputs for each training sample through independent branch constraints, with an anchor term regularising conditional predictions toward ground-truth noise. The framework further introduces TIRT Transfer, which copies the teacher's converged per-timestep importance curriculum into the student as a frozen prior, eliminating the need to relearn it within limited distillation budgets. Experiments on CIFAR-10 and CIFAR-100 demonstrate that 5.9x compression maintains quality within 4 FID points of the teacher at 50-step DDIM sampling, considerably outperforming training from scratch with guidance fidelity well preserved. Ablation studies confirm that unconditional supervision is the dominant contribution, accounting for over 60% of total distillation gain. Curriculum transfer and anchor regularisation provide complementary benefit, together validating dual-branch constraints as empirically essential for guidance-preserving compression.

LGSep 28, 2022

Explainable classification of astronomical uncertain time series

Michael Franklin Mbouopda, Emille E O Ishida, Engelbert Mephu Nguifo et al.

Exploring the expansion history of the universe, understanding its evolutionary stages, and predicting its future evolution are important goals in astrophysics. Today, machine learning tools are used to help achieving these goals by analyzing transient sources, which are modeled as uncertain time series. Although black-box methods achieve appreciable performance, existing interpretable time series methods failed to obtain acceptable performance for this type of data. Furthermore, data uncertainty is rarely taken into account in these methods. In this work, we propose an uncertaintyaware subsequence based model which achieves a classification comparable to that of state-of-the-art methods. Unlike conformal learning which estimates model uncertainty on predictions, our method takes data uncertainty as additional input. Moreover, our approach is explainable-by-design, giving domain experts the ability to inspect the model and explain its predictions. The explainability of the proposed method has also the potential to inspire new developments in theoretical astrophysics modeling by suggesting important subsequences which depict details of light curve shapes. The dataset, the source code of our experiment, and the results are made available on a public repository.

CVDec 5, 2022

A comparative study of emotion recognition methods using facial expressions

Rim EL Cheikh, Hélène Tran, Issam Falih et al.

Understanding the facial expressions of our interlocutor is important to enrich the communication and to give it a depth that goes beyond the explicitly expressed. In fact, studying one's facial expression gives insight into their hidden emotion state. However, even as humans, and despite our empathy and familiarity with the human emotional experience, we are only able to guess what the other might be feeling. In the fields of artificial intelligence and computer vision, Facial Emotion Recognition (FER) is a topic that is still in full growth mostly with the advancement of deep learning approaches and the improvement of data collection. The main purpose of this paper is to compare the performance of three state-of-the-art networks, each having their own approach to improve on FER tasks, on three FER datasets. The first and second sections respectively describe the three datasets and the three studied network architectures designed for an FER task. The experimental protocol, the results and their interpretation are outlined in the remaining sections.

LGDec 22, 2025

A K-Means, Ward and DBSCAN repeatability study

Anthony Bertrand, Engelbert Mephu Nguifo, Violaine Antoine et al.

Reproducibility is essential in machine learning because it ensures that a model or experiment yields the same scientific conclusion. For specific algorithms repeatability with bitwise identical results is also a key for scientific integrity because it allows debugging. We decomposed several very popular clustering algorithms: K-Means, DBSCAN and Ward into their fundamental steps, and we identify the conditions required to achieve repeatability at each stage. We use an implementation example with the Python library scikit-learn to examine the repeatable aspects of each method. Our results reveal inconsistent results with K-Means when the number of OpenMP threads exceeds two. This work aims to raise awareness of this issue among both users and developers, encouraging further investigation and potential fixes.

AIAug 30, 2022

Expert Opinion Elicitation for Assisting Deep Learning based Lyme Disease Classifier with Patient Data

Sk Imran Hossain, Jocelyn de Goër de Herve, David Abrial et al.

Diagnosing erythema migrans (EM) skin lesion, the most common early symptom of Lyme disease using deep learning techniques can be effective to prevent long-term complications. Existing works on deep learning based EM recognition only utilizes lesion image due to the lack of a dataset of Lyme disease related images with associated patient data. Physicians rely on patient information about the background of the skin lesion to confirm their diagnosis. In order to assist the deep learning model with a probability score calculated from patient data, this study elicited opinion from fifteen doctors. For the elicitation process, a questionnaire with questions and possible answers related to EM was prepared. Doctors provided relative weights to different answers to the questions. We converted doctors evaluations to probability scores using Gaussian mixture based density estimation. For elicited probability model validation, we exploited formal concept analysis and decision tree. The elicited probability scores can be utilized to make image based deep learning Lyme disease pre-scanners robust.

LGFeb 3, 2021

Uncertain Time Series Classification With Shapelet Transform

Michael Franklin Mbouopda, Engelbert Mephu Nguifo

Time series classification is a task that aims at classifying chronological data. It is used in a diverse range of domains such as meteorology, medicine and physics. In the last decade, many algorithms have been built to perform this task with very appreciable accuracy. However, applications where time series have uncertainty has been under-explored. Using uncertainty propagation techniques, we propose a new uncertain dissimilarity measure based on Euclidean distance. We then propose the uncertain shapelet transform algorithm for the classification of uncertain time series. The large experiments we conducted on state of the art datasets show the effectiveness of our contribution. The source code of our contribution and the datasets we used are all available on a public repository.

LGMay 22, 2020

Discovering Frequent Gradual Itemsets with Imprecise Data

Michaël Chirmeni Boujike, Jerry Lonlac, Norbert Tsopze et al.

The gradual patterns that model the complex co-variations of attributes of the form "The more/less X, The more/less Y" play a crucial role in many real world applications where the amount of numerical data to manage is important, this is the biological data. Recently, these types of patterns have caught the attention of the data mining community, where several methods have been defined to automatically extract and manage these patterns from different data models. However, these methods are often faced the problem of managing the quantity of mined patterns, and in many practical applications, the calculation of all these patterns can prove to be intractable for the user-defined frequency threshold and the lack of focus leads to generating huge collections of patterns. Moreover another problem with the traditional approaches is that the concept of gradualness is defined just as an increase or a decrease. Indeed, a gradualness is considered as soon as the values of the attribute on both objects are different. As a result, numerous quantities of patterns extracted by traditional algorithms can be presented to the user although their gradualness is only a noise effect in the data. To address this issue, this paper suggests to introduce the gradualness thresholds from which to consider an increase or a decrease. In contrast to literature approaches, the proposed approach takes into account the distribution of attribute values, as well as the user's preferences on the gradualness threshold and makes it possible to extract gradual patterns on certain databases where literature approaches fail due to too large search space. Moreover, results from an experimental evaluation on real databases show that the proposed algorithm is scalable, efficient, and can eliminate numerous patterns that do not verify specific gradualness requirements to show a small set of patterns to the user.

LGDec 11, 2019

Classification des S{é}ries Temporelles Incertaines par Transformation Shapelet

Michael Mbouopda, Engelbert Mephu Nguifo

Time serie classification is used in a diverse range of domain such as meteorology, medicine and physics. It aims to classify chronological data. Many accurate approaches have been built during the last decade and shapelet transformation is one of them. However, none of these approaches does take data uncertainty into account. Using uncertainty propagation techiniques, we propose a new dissimilarity measure based on euclidean distance. We also show how to use this new measure to adapt shapelet transformation to uncertain time series classification. An experimental assessment of our contribution is done on some state of the art datasets.

AIMar 20, 2019

Extracting Frequent Gradual Patterns Using Constraints Modeling

Jerry Lonlac, Saïdd Jabbour, Engelbert Mephu Nguifo et al.

In this paper, we propose a constraint-based modeling approach for the problem of discovering frequent gradual patterns in a numerical dataset. This SAT-based declarative approach offers an additional possibility to benefit from the recent progress in satisfiability testing and to exploit the efficiency of modern SAT solvers for enumerating all frequent gradual patterns in a numerical dataset. Our approach can easily be extended with extra constraints, such as temporal constraints in order to extract more specific patterns in a broad range of gradual patterns mining applications. We show the practical feasibility of our SAT model by running experiments on two real world datasets.

AIMay 31, 2017

Towards Learned Clauses Database Reduction Strategies Based on Dominance Relationship

Jerry Lonlac, Engelbert Mephu Nguifo

Clause Learning is one of the most important components of a conflict driven clause learning (CDCL) SAT solver that is effective on industrial instances. Since the number of learned clauses is proved to be exponential in the worse case, it is necessary to identify the most relevant clauses to maintain and delete the irrelevant ones. As reported in the literature, several learned clauses deletion strategies have been proposed. However the diversity in both the number of clauses to be removed at each step of reduction and the results obtained with each strategy creates confusion to determine which criterion is better. Thus, the problem to select which learned clauses are to be removed during the search step remains very challenging. In this paper, we propose a novel approach to identify the most relevant learned clauses without favoring or excluding any of the proposed measures, but by adopting the notion of dominance relationship among those measures. Our approach bypasses the problem of the diversity of results and reaches a compromise between the assessments of these measures. Furthermore, the proposed approach also avoids another non-trivial problem which is the amount of clauses to be deleted at each reduction of the learned clause database.

LGJan 30, 2016

Multiple instance learning for sequence data with across bag dependencies

Manel Zoghlami, Sabeur Aridhi, Mondher Maddouri et al.

In Multiple Instance Learning (MIL) problem for sequence data, the instances inside the bags are sequences. In some real world applications such as bioinformatics, comparing a random couple of sequences makes no sense. In fact, each instance may have structural and/or functional relations with instances of other bags. Thus, the classification task should take into account this across bag relation. In this work, we present two novel MIL approaches for sequence data classification named ABClass and ABSim. ABClass extracts motifs from related instances and use them to encode sequences. A discriminative classifier is then applied to compute a partial classification result for each set of related sequences. ABSim uses a similarity measure to discriminate the related instances and to compute a scores matrix. For both approaches, an aggregation method is applied in order to generate the final classification result. We applied both approaches to solve the problem of bacterial Ionizing Radiation Resistance prediction. The experimental results of the presented approaches are satisfactory.

NEDec 17, 2014

Towards a constructive multilayer perceptron for regression task using non-parametric clustering. A case study of Photo-Z redshift reconstruction

Cyrine Arouri, Engelbert Mephu Nguifo, Sabeur Aridhi et al.

The choice of architecture of artificial neuron network (ANN) is still a challenging task that users face every time. It greatly affects the accuracy of the built network. In fact there is no optimal method that is applicable to various implementations at the same time. In this paper we propose a method to construct ANN based on clustering, that resolves the problems of random and ad hoc approaches for multilayer ANN architecture. Our method can be applied to regression problems. Experimental results obtained with different datasets, reveals the efficiency of our method.

IRMay 21, 2013

Nouvelle approche de recommandation personnalisee dans les folksonomies basee sur le profil des utilisateurs

Mohamed Nader Jelassi, Sadok Ben Yahia, Engelbert Mephu Nguifo

In folksonomies, users use to share objects (movies, books, bookmarks, etc.) by annotating them with a set of tags of their own choice. With the rise of the Web 2.0 age, users become the core of the system since they are both the contributors and the creators of the information. Yet, each user has its own profile and its own ideas making thereby the strength as well as the weakness of folksonomies. Indeed, it would be helpful to take account of users' profile when suggesting a list of tags and resources or even a list of friends, in order to make a personal recommandation, instead of suggesting the more used tags and resources in the folksonomy. In this paper, we consider users' profile as a new dimension of a folksonomy classically composed of three dimensions <users, tags, ressources> and we propose an approach to group users with equivalent profiles and equivalent interests as quadratic concepts. Then, we use such structures to propose our personalized recommendation system of users, tags and resources according to each user's profile. Carried out experiments on two real-world datasets, i.e., MovieLens and BookCrossing highlight encouraging results in terms of precision as well as a good social evaluation.

CEMar 8, 2013

Mining Representative Unsubstituted Graph Patterns Using Prior Similarity Matrix

Wajdi Dhifli, Rabie Saidi, Engelbert Mephu Nguifo

One of the most powerful techniques to study protein structures is to look for recurrent fragments (also called substructures or spatial motifs), then use them as patterns to characterize the proteins under study. An emergent trend consists in parsing proteins three-dimensional (3D) structures into graphs of amino acids. Hence, the search of recurrent spatial motifs is formulated as a process of frequent subgraph discovery where each subgraph represents a spatial motif. In this scope, several efficient approaches for frequent subgraph discovery have been proposed in the literature. However, the set of discovered frequent subgraphs is too large to be efficiently analyzed and explored in any further process. In this paper, we propose a novel pattern selection approach that shrinks the large number of discovered frequent subgraphs by selecting the representative ones. Existing pattern selection approaches do not exploit the domain knowledge. Yet, in our approach we incorporate the evolutionary information of amino acids defined in the substitution matrices in order to select the representative subgraphs. We show the effectiveness of our approach on a number of real datasets. The results issued from our experiments show that our approach is able to considerably decrease the number of motifs while enhancing their interestingness.

LGJun 21, 2012

Feature extraction in protein sequences classification : a new stability measure

Rabie Saidi, Sabeur Aridhi, Mondher Maddouri et al.

Feature extraction is an unavoidable task, especially in the critical step of preprocessing biological sequences. This step consists for example in transforming the biological sequences into vectors of motifs where each motif is a subsequence that can be seen as a property (or attribute) characterizing the sequence. Hence, we obtain an object-property table where objects are sequences and properties are motifs extracted from sequences. This output can be used to apply standard machine learning tools to perform data mining tasks such as classification. Several previous works have described feature extraction methods for bio-sequence classification, but none of them discussed the robustness of these methods when perturbing the input data. In this work, we introduce the notion of stability of the generated motifs in order to study the robustness of motif extraction methods. We express this robustness in terms of the ability of the method to reveal any change occurring in the input data and also its ability to target the interesting motifs. We use these criteria to evaluate and experimentally compare four existing extraction methods for biological sequences.