Olivier Déforges

IV
h-index28
11papers
199citations
Novelty45%
AI Score38

11 Papers

CVJul 19, 2023
NTIRE 2023 Quality Assessment of Video Enhancement Challenge

Xiaohong Liu, Xiongkuo Min, Wei Sun et al. · eth-zurich

This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual Video Enhancement (VDPVE), which has a total of 1211 enhanced videos, including 600 videos with color, brightness, and contrast enhancements, 310 videos with deblurring, and 301 deshaked videos. The challenge has a total of 167 registered participants. 61 participating teams submitted their prediction results during the development phase, with a total of 3168 submissions. A total of 176 submissions were submitted by 37 participating teams during the final testing phase. Finally, 19 participating teams submitted their models and fact sheets, and detailed the methods they used. Some methods have achieved better results than baseline methods, and the winning methods have demonstrated superior prediction performance.

LGJun 5, 2022
Federated Adversarial Training with Transformers

Ahmed Aldahdooh, Wassim Hamidouche, Olivier Déforges

Federated learning (FL) has emerged to enable global model training over distributed clients' data while preserving its privacy. However, the global trained model is vulnerable to the evasion attacks especially, the adversarial examples (AEs), carefully crafted samples to yield false classification. Adversarial training (AT) is found to be the most promising approach against evasion attacks and it is widely studied for convolutional neural network (CNN). Recently, vision transformers have been found to be effective in many computer vision tasks. To the best of the authors' knowledge, there is no work that studied the feasibility of AT in a FL process for vision transformers. This paper investigates such feasibility with different federated model aggregation methods and different vision transformer models with different tokenization and classification head techniques. In order to improve the robust accuracy of the models with the not independent and identically distributed (Non-IID), we propose an extension to FedAvg aggregation method, called FedWAvg. By measuring the similarities between the last layer of the global model and the last layer of the client updates, FedWAvg calculates the weights to aggregate the local models updates. The experiments show that FedWAvg improves the robust accuracy when compared with other state-of-the-art aggregation methods.

IVFeb 5, 2024Code
Cool-chic video: Learned video coding with 800 parameters

Thomas Leguay, Théo Ladune, Pierrick Philippe et al.

We propose a lightweight learned video codec with 900 multiplications per decoded pixel and 800 parameters overall. To the best of our knowledge, this is one of the neural video codecs with the lowest decoding complexity. It is built upon the overfitted image codec Cool-chic and supplements it with an inter coding module to leverage the video's temporal redundancies. The proposed model is able to compress videos using both low-delay and random access configurations and achieves rate-distortion close to AVC while out-performing other overfitted codecs such as FFNeRV. The system is made open-source: orange-opensource.github.io/Cool-Chic.

IVJul 11, 2025Code
VIP: Visual Information Protection through Adversarial Attacks on Vision-Language Models

Hanene F. Z. Brachemi Meftah, Wassim Hamidouche, Sid Ahmed Fezza et al.

Recent years have witnessed remarkable progress in developing Vision-Language Models (VLMs) capable of processing both textual and visual inputs. These models have demonstrated impressive performance, leading to their widespread adoption in various applications. However, this widespread raises serious concerns regarding user privacy, particularly when models inadvertently process or expose private visual information. In this work, we frame the preservation of privacy in VLMs as an adversarial attack problem. We propose a novel attack strategy that selectively conceals information within designated Region Of Interests (ROIs) in an image, effectively preventing VLMs from accessing sensitive content while preserving the semantic integrity of the remaining image. Unlike conventional adversarial attacks that often disrupt the entire image, our method maintains high coherence in unmasked areas. Experimental results across three state-of-the-art VLMs namely LLaVA, Instruct-BLIP, and BLIP2-T5 demonstrate up to 98% reduction in detecting targeted ROIs, while maintaining global image semantics intact, as confirmed by high similarity scores between clean and adversarial outputs. We believe that this work contributes to a more privacy conscious use of multimodal models and offers a practical tool for further research, with the source code publicly available at: https://github.com/hbrachemi/Vlm_defense-attack.

CVJan 14, 2025Code
Energy Backdoor Attack to Deep Neural Networks

Hanene F. Z. Brachemi Meftah, Wassim Hamidouche, Sid Ahmed Fezza et al.

The rise of deep learning (DL) has increased computing complexity and energy use, prompting the adoption of application specific integrated circuits (ASICs) for energy-efficient edge and mobile deployment. However, recent studies have demonstrated the vulnerability of these accelerators to energy attacks. Despite the development of various inference time energy attacks in prior research, backdoor energy attacks remain unexplored. In this paper, we design an innovative energy backdoor attack against deep neural networks (DNNs) operating on sparsity-based accelerators. Our attack is carried out in two distinct phases: backdoor injection and backdoor stealthiness. Experimental results using ResNet-18 and MobileNet-V2 models trained on CIFAR-10 and Tiny ImageNet datasets show the effectiveness of our proposed attack in increasing energy consumption on trigger samples while preserving the model's performance for clean/regular inputs. This demonstrates the vulnerability of DNNs to energy backdoor attacks. The source code of our attack is available at: https://github.com/hbrachemi/energy_backdoor.

NEApr 19, 2021
Conditional Coding and Variable Bitrate for Practical Learned Video Coding

Théo Ladune, Pierrick Philippe, Wassim Hamidouche et al.

This paper introduces a practical learned video codec. Conditional coding and quantization gain vectors are used to provide flexibility to a single encoder/decoder pair, which is able to compress video sequences at a variable bitrate. The flexibility is leveraged at test time by choosing the rate and GOP structure to optimize a rate-distortion cost. Using the CLIC21 video test conditions, the proposed approach shows performance on par with HEVC.

CRMar 9, 2021
Revisiting Model's Uncertainty and Confidences for Adversarial Example Detection

Ahmed Aldahdooh, Wassim Hamidouche, Olivier Déforges

Security-sensitive applications that rely on Deep Neural Networks (DNNs) are vulnerable to small perturbations that are crafted to generate Adversarial Examples(AEs). The AEs are imperceptible to humans and cause DNN to misclassify them. Many defense and detection techniques have been proposed. Model's confidences and Dropout, as a popular way to estimate the model's uncertainty, have been used for AE detection but they showed limited success against black- and gray-box attacks. Moreover, the state-of-the-art detection techniques have been designed for specific attacks or broken by others, need knowledge about the attacks, are not consistent, increase model parameters overhead, are time-consuming, or have latency in inference time. To trade off these factors, we revisit the model's uncertainty and confidences and propose a novel unsupervised ensemble AE detection mechanism that 1) uses the uncertainty method called SelectiveNet, 2) processes model layers outputs, i.e.feature maps, to generate new confidence probabilities. The detection method is called Selective and Feature based Adversarial Detection (SFAD). Experimental results show that the proposed approach achieves better performance against black- and gray-box attacks than the state-of-the-art methods and achieves comparable performance against white-box attacks. Moreover, results show that SFAD is fully robust against High Confidence Attacks (HCAs) for MNIST and partially robust for CIFAR10 datasets.

CRMar 6, 2021
Selective Encryption of the Versatile Video Coding Standard

Guillaume Gautier, Mousa FarajAllah, Wassim Hamidouche et al.

Versatile video coding (VVC) is the next generation video coding standard developed by the joint video experts team (JVET) and released in July 2020. VVC introduces several new coding tools providing a significant coding gain over the high efficiency video coding (HEVC) standard. It is well known that increasing the coding efficiency adds more dependencies in the video bitstream making format-compliant encryption with the standard more challenging. In this paper we tackle the problem of selective encryption of the VVC standard in format-compliant and constant bitrate. These two constraints ensure that the encrypted bitstream can be decoded by any VVC decoder while the bitrate remains unchanged by the encryption. The selective encryption of all possible VVC syntax elements is investigated. A new algorithm is proposed to encrypt in format-compliant and constant bitrate the transform coefficients (TCs) together with other syntax elements at the level of the entropy encoder. The proposed solution was integrated and assessed under the VVC reference software model version 6.0. Experimental results showed that the encryption drastically decreases the video quality while the encryption is robust against several types of attacks. The encryption space is estimated in the range of 15% to 26% of the bitstream size resulting in a lightweight encryption process. The web page of this work is available at https://gugautie.github.io/sevvc/.

IVAug 6, 2020
Optical Flow and Mode Selection for Learning-based Video Coding

Théo Ladune, Pierrick Philippe, Wassim Hamidouche et al.

This paper introduces a new method for inter-frame coding based on two complementary autoencoders: MOFNet and CodecNet. MOFNet aims at computing and conveying the Optical Flow and a pixel-wise coding Mode selection. The optical flow is used to perform a prediction of the frame to code. The coding mode selection enables competition between direct copy of the prediction or transmission through CodecNet. The proposed coding scheme is assessed under the Challenge on Learned Image Compression 2020 (CLIC20) P-frame coding conditions, where it is shown to perform on par with the state-of-the-art video codec ITU/MPEG HEVC. Moreover, the possibility of copying the prediction enables to learn the optical flow in an end-to-end fashion i.e. without relying on pre-training and/or a dedicated loss term.

NEJul 6, 2020
ModeNet: Mode Selection Network For Learned Video Coding

Théo Ladune, Pierrick Philippe, Wassim Hamidouche et al.

In this paper, a mode selection network (ModeNet) is proposed to enhance deep learning-based video compression. Inspired by traditional video coding, ModeNet purpose is to enable competition among several coding modes. The proposed ModeNet learns and conveys a pixel-wise partitioning of the frame, used to assign each pixel to the most suited coding mode. ModeNet is trained alongside the different coding modes to minimize a rate-distortion cost. It is a flexible component which can be generalized to other systems to allow competition between different coding tools. Mod-eNet interest is studied on a P-frame coding task, where it is used to design a method for coding a frame given its prediction. ModeNet-based systems achieve compelling performance when evaluated under the Challenge on Learned Image Compression 2020 (CLIC20) P-frame coding track conditions.

LGJun 1, 2019
Perceptual Evaluation of Adversarial Attacks for CNN-based Image Classification

Sid Ahmed Fezza, Yassine Bakhti, Wassim Hamidouche et al.

Deep neural networks (DNNs) have recently achieved state-of-the-art performance and provide significant progress in many machine learning tasks, such as image classification, speech processing, natural language processing, etc. However, recent studies have shown that DNNs are vulnerable to adversarial attacks. For instance, in the image classification domain, adding small imperceptible perturbations to the input image is sufficient to fool the DNN and to cause misclassification. The perturbed image, called \textit{adversarial example}, should be visually as close as possible to the original image. However, all the works proposed in the literature for generating adversarial examples have used the $L_{p}$ norms ($L_{0}$, $L_{2}$ and $L_{\infty}$) as distance metrics to quantify the similarity between the original image and the adversarial example. Nonetheless, the $L_{p}$ norms do not correlate with human judgment, making them not suitable to reliably assess the perceptual similarity/fidelity of adversarial examples. In this paper, we present a database for visual fidelity assessment of adversarial examples. We describe the creation of the database and evaluate the performance of fifteen state-of-the-art full-reference (FR) image fidelity assessment metrics that could substitute $L_{p}$ norms. The database as well as subjective scores are publicly available to help designing new metrics for adversarial examples and to facilitate future research works.