Hatem Hajri

LG
h-index30
15papers
191citations
Novelty42%
AI Score42

15 Papers

LGJun 21, 2022
Riemannian data-dependent randomized smoothing for neural networks certification

Pol Labarbarie, Hatem Hajri, Marc Arnaudon

Certification of neural networks is an important and challenging problem that has been attracting the attention of the machine learning community since few years. In this paper, we focus on randomized smoothing (RS) which is considered as the state-of-the-art method to obtain certifiably robust neural networks. In particular, a new data-dependent RS technique called ANCER introduced recently can be used to certify ellipses with orthogonal axis near each input data of the neural network. In this work, we remark that ANCER is not invariant under rotation of input data and propose a new rotationally-invariant formulation of it which can certify ellipses without constraints on their axis. Our approach called Riemannian Data Dependant Randomized Smoothing (RDDRS) relies on information geometry techniques on the manifold of covariance matrices and can certify bigger regions than ANCER based on our experiments on the MNIST dataset.

OCMay 24, 2022
Realization Theory Of Recurrent Neural ODEs Using Polynomial System Embeddings

Martin Gonzalez, Thibault Defourneau, Hatem Hajri et al.

In this paper we show that neural ODE analogs of recurrent (ODE-RNN) and Long Short-Term Memory (ODE-LSTM) networks can be algorithmically embeddeded into the class of polynomial systems. This embedding preserves input-output behavior and can suitably be extended to other neural DE architectures. We then use realization theory of polynomial systems to provide necessary conditions for an input-output map to be realizable by an ODE-LSTM and sufficient conditions for minimality of such systems. These results represent the first steps towards realization theory of recurrent neural ODE architectures, which is is expected be useful for model reduction and learning algorithm analysis of recurrent neural ODEs.

LGJun 16, 2022
Noisy Learning for Neural ODEs Acts as a Robustness Locus Widening

Martin Gonzalez, Hatem Hajri, Loic Cantat et al.

We investigate the problems and challenges of evaluating the robustness of Differential Equation-based (DE) networks against synthetic distribution shifts. We propose a novel and simple accuracy metric which can be used to evaluate intrinsic robustness and to validate dataset corruption simulators. We also propose methodology recommendations, destined for evaluating the many faces of neural DEs' robustness and for comparing them with their discrete counterparts rigorously. We then use this criteria to evaluate a cheap data augmentation technique as a reliable way for demonstrating the natural robustness of neural ODEs against simulated image corruptions across multiple datasets.

LGJan 12
Reward-Preserving Attacks For Robust Reinforcement Learning

Lucas Schott, Elies Gherbi, Hatem Hajri et al.

Adversarial robustness in RL is difficult because perturbations affect entire trajectories: strong attacks can break learning, while weak attacks yield little robustness, and the appropriate strength varies by state. We propose $α$-reward-preserving attacks, which adapt the strength of the adversary so that an $α$ fraction of the nominal-to-worst-case return gap remains achievable at each state. In deep RL, we use a gradient-based attack direction and learn a state-dependent magnitude $η\le η_{\mathcal B}$ selected via a critic $Q^π_α((s,a),η)$ trained off-policy over diverse radii. This adaptive tuning calibrates attack strength and, with intermediate $α$, improves robustness across radii while preserving nominal performance, outperforming fixed- and random-radius baselines.

CVJul 12, 2020Code
Probabilistic Jacobian-based Saliency Maps Attacks

Théo Combey, António Loison, Maxime Faucher et al.

Neural network classifiers (NNCs) are known to be vulnerable to malicious adversarial perturbations of inputs including those modifying a small fraction of the input features named sparse or $L_0$ attacks. Effective and fast $L_0$ attacks, such as the widely used Jacobian-based Saliency Map Attack (JSMA) are practical to fool NNCs but also to improve their robustness. In this paper, we show that penalising saliency maps of JSMA by the output probabilities and the input features of the NNC allows to obtain more powerful attack algorithms that better take into account each input's characteristics. This leads us to introduce improved versions of JSMA, named Weighted JSMA (WJSMA) and Taylor JSMA (TJSMA), and demonstrate through a variety of white-box and black-box experiments on three different datasets (MNIST, CIFAR-10 and GTSRB), that they are both significantly faster and more efficient than the original targeted and non-targeted versions of JSMA. Experiments also demonstrate, in some cases, very competitive results of our attacks in comparison with the Carlini-Wagner (CW) $L_0$ attack, while remaining, like JSMA, significantly faster (WJSMA and TJSMA are more than 50 times faster than CW $L_0$ on CIFAR-10). Therefore, our new attacks provide good trade-offs between JSMA and CW for $L_0$ real-time adversarial testing on datasets such as the ones previously cited. Codes are publicly available through the link https://github.com/probabilistic-jsmas/probabilistic-jsmas.

LGApr 7, 2020Code
Geomstats: A Python Package for Riemannian Geometry in Machine Learning

Nina Miolane, Alice Le Brigant, Johan Mathe et al.

We introduce Geomstats, an open-source Python toolbox for computations and statistics on nonlinear manifolds, such as hyperbolic spaces, spaces of symmetric positive definite matrices, Lie groups of transformations, and many more. We provide object-oriented and extensively unit-tested implementations. Among others, manifolds come equipped with families of Riemannian metrics, with associated exponential and logarithmic maps, geodesics and parallel transport. Statistics and learning algorithms provide methods for estimation, clustering and dimension reduction on manifolds. All associated operations are vectorized for batch computation and provide support for different execution backends, namely NumPy, PyTorch and TensorFlow, enabling GPU acceleration. This paper presents the package, compares it with related libraries and provides relevant code examples. We show that Geomstats provides reliable building blocks to foster research in differential geometry and statistics, and to democratize the use of Riemannian geometry in machine learning applications. The source code is freely available under the MIT license at \url{geomstats.ai}.

CYFeb 5, 2020Code
FRSign: A Large-Scale Traffic Light Dataset for Autonomous Trains

Jeanine Harb, Nicolas Rébéna, Raphaël Chosidow et al.

In the realm of autonomous transportation, there have been many initiatives for open-sourcing self-driving cars datasets, but much less for alternative methods of transportation such as trains. In this paper, we aim to bridge the gap by introducing FRSign, a large-scale and accurate dataset for vision-based railway traffic light detection and recognition. Our recordings were made on selected running trains in France and benefited from carefully hand-labeled annotations. An illustrative dataset which corresponds to ten percent of the acquired data to date is published in open source with the paper. It contains more than 100,000 images illustrating six types of French railway traffic lights and their possible color combinations, together with the relevant information regarding their acquisition such as date, time, sensor parameters, and bounding boxes. This dataset is published in open-source at the address \url{https://frsign.irt-systemx.fr}. We compare, analyze various properties of the dataset and provide metrics to express its variability. We also discuss specific challenges and particularities related to autonomous trains in comparison to autonomous cars.

LGMar 1, 2024
Robust Deep Reinforcement Learning Through Adversarial Attacks and Training : A Survey

Lucas Schott, Josephine Delas, Hatem Hajri et al.

Deep Reinforcement Learning (DRL) is a subfield of machine learning for training autonomous agents that take sequential actions across complex environments. Despite its significant performance in well-known environments, it remains susceptible to minor condition variations, raising concerns about its reliability in real-world applications. To improve usability, DRL must demonstrate trustworthiness and robustness. A way to improve the robustness of DRL to unknown changes in the environmental conditions and possible perturbations is through Adversarial Training, by training the agent against well-suited adversarial attacks on the observations and the dynamics of the environment. Addressing this critical issue, our work presents an in-depth analysis of contemporary adversarial attack and training methodologies, systematically categorizing them and comparing their objectives and operational mechanisms.

CLAug 27, 2025
Towards stable AI systems for Evaluating Arabic Pronunciations

Hadi Zaatiti, Hatem Hajri, Osama Abdullah et al.

Modern Arabic ASR systems such as wav2vec 2.0 excel at word- and sentence-level transcription, yet struggle to classify isolated letters. In this study, we show that this phoneme-level task, crucial for language learning, speech therapy, and phonetic research, is challenging because isolated letters lack co-articulatory cues, provide no lexical context, and last only a few hundred milliseconds. Recogniser systems must therefore rely solely on variable acoustic cues, a difficulty heightened by Arabic's emphatic (pharyngealized) consonants and other sounds with no close analogues in many languages. This study introduces a diverse, diacritised corpus of isolated Arabic letters and demonstrates that state-of-the-art wav2vec 2.0 models achieve only 35% accuracy on it. Training a lightweight neural network on wav2vec embeddings raises performance to 65%. However, adding a small amplitude perturbation (epsilon = 0.05) cuts accuracy to 32%. To restore robustness, we apply adversarial training, limiting the noisy-speech drop to 9% while preserving clean-speech accuracy. We detail the corpus, training pipeline, and evaluation protocol, and release, on demand, data and code for reproducibility. Finally, we outline future work extending these methods to word- and sentence-level frameworks, where precise letter pronunciation remains critical.

LGMay 23, 2023
SEEDS: Exponential SDE Solvers for Fast High-Quality Sampling from Diffusion Models

Martin Gonzalez, Nelson Fernandez, Thuy Tran et al.

A potent class of generative models known as Diffusion Probabilistic Models (DPMs) has become prominent. A forward diffusion process adds gradually noise to data, while a model learns to gradually denoise. Sampling from pre-trained DPMs is obtained by solving differential equations (DE) defined by the learnt model, a process which has shown to be prohibitively slow. Numerous efforts on speeding-up this process have consisted on crafting powerful ODE solvers. Despite being quick, such solvers do not usually reach the optimal quality achieved by available slow SDE solvers. Our goal is to propose SDE solvers that reach optimal quality without requiring several hundreds or thousands of NFEs to achieve that goal. We propose Stochastic Explicit Exponential Derivative-free Solvers (SEEDS), improving and generalizing Exponential Integrator approaches to the stochastic case on several frameworks. After carefully analyzing the formulation of exact solutions of diffusion SDEs, we craft SEEDS to analytically compute the linear part of such solutions. Inspired by the Exponential Time-Differencing method, SEEDS use a novel treatment of the stochastic components of solutions, enabling the analytical computation of their variance, and contains high-order terms allowing to reach optimal quality sampling $\sim3$-$5\times$ faster than previous SDE methods. We validate our approach on several image generation benchmarks, showing that SEEDS outperform or are competitive with previous SDE solvers. Contrary to the latter, SEEDS are derivative and training free, and we fully prove strong convergence guarantees for them.

LGApr 7, 2021
Improving Robustness of Deep Reinforcement Learning Agents: Environment Attack based on the Critic Network

Lucas Schott, Hatem Hajri, Sylvain Lamprier

To improve policy robustness of deep reinforcement learning agents, a line of recent works focus on producing disturbances of the environment. Existing approaches of the literature to generate meaningful disturbances of the environment are adversarial reinforcement learning methods. These methods set the problem as a two-player game between the protagonist agent, which learns to perform a task in an environment, and the adversary agent, which learns to disturb the protagonist via modifications of the considered environment. Both protagonist and adversary are trained with deep reinforcement learning algorithms. Alternatively, we propose in this paper to build on gradient-based adversarial attacks, usually used for classification tasks for instance, that we apply on the critic network of the protagonist to identify efficient disturbances of the environment. Rather than learning an attacker policy, which usually reveals as very complex and unstable, we leverage the knowledge of the critic network of the protagonist, to dynamically complexify the task at each step of the learning process. We show that our method, while being faster and lighter, leads to significantly better improvements in policy robustness than existing methods of the literature.

LGNov 24, 2020
Stochastic sparse adversarial attacks

Manon Césaire, Lucas Schott, Hatem Hajri et al.

This paper introduces stochastic sparse adversarial attacks (SSAA), standing as simple, fast and purely noise-based targeted and untargeted attacks of neural network classifiers (NNC). SSAA offer new examples of sparse (or $L_0$) attacks for which only few methods have been proposed previously. These attacks are devised by exploiting a small-time expansion idea widely used for Markov processes. Experiments on small and large datasets (CIFAR-10 and ImageNet) illustrate several advantages of SSAA in comparison with the-state-of-the-art methods. For instance, in the untargeted case, our method called Voting Folded Gaussian Attack (VFGA) scales efficiently to ImageNet and achieves a significantly lower $L_0$ score than SparseFool (up to $\frac{2}{5}$) while being faster. Moreover, VFGA achieves better $L_0$ scores on ImageNet than Sparse-RS when both attacks are fully successful on a large number of samples.

LGJul 2, 2019
From Node Embedding To Community Embedding : A Hyperbolic Approach

Thomas Gerald, Hadi Zaatiti, Hatem Hajri et al.

Detecting communities on graphs has received significant interest in recent literature. Current state-of-the-art community embedding approach called \textit{ComE} tackles this problem by coupling graph embedding with community detection. Considering the success of hyperbolic representations of graph-structured data in last years, an ongoing challenge is to set up a hyperbolic approach for the community detection problem. The present paper meets this challenge by introducing a Riemannian equivalent of \textit{ComE}. Our proposed approach combines hyperbolic embeddings with Riemannian K-means or Riemannian mixture models to perform community detection. We illustrate the usefulness of this framework through several experiments on real-world social networks and comparisons with \textit{ComE} and recent hyperbolic-based classification approaches.

ROJul 30, 2018
Real Time Lidar and Radar High-Level Fusion for Obstacle Detection and Tracking with evaluation on a ground truth

Hatem Hajri, Mohamed-Cherif Rahal

- Both Lidars and Radars are sensors for obstacle detection. While Lidars are very accurate on obstacles positions and less accurate on their velocities, Radars are more precise on obstacles velocities and less precise on their positions. Sensor fusion between Lidar and Radar aims at improving obstacle detection using advantages of the two sensors. The present paper proposes a real-time Lidar/Radar data fusion algorithm for obstacle detection and tracking based on the global nearest neighbour standard filter (GNN). This algorithm is implemented and embedded in an automative vehicle as a component generated by a real-time multisensor software. The benefits of data fusion comparing with the use of a single sensor are illustrated through several tracking scenarios (on a highway and on a bend) and using real-time kinematic sensors mounted on the ego and tracked vehicles as a ground truth.

ROJul 16, 2018
Automatic generation of ground truth for the evaluation of obstacle detection and tracking techniques

Hatem Hajri, Emmanuel Doucet, Marc Revilloud et al.

As automated vehicles are getting closer to becoming a reality, it will become mandatory to be able to characterise the performance of their obstacle detection systems. This validation process requires large amounts of ground-truth data, which is currently generated by manually annotation. In this paper, we propose a novel methodology to generate ground-truth kinematics datasets for specific objects in real-world scenes. Our procedure requires no annotation whatsoever, human intervention being limited to sensors calibration. We present the recording platform which was exploited to acquire the reference data and a detailed and thorough analytical study of the propagation of errors in our procedure. This allows us to provide detailed precision metrics for each and every data item in our datasets. Finally some visualisations of the acquired data are given.