Benoît Macq

CV
h-index36
12papers
46citations
Novelty46%
AI Score44

12 Papers

CVSep 1, 2024Code
Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification

Karim El Khoury, Maxime Zanella, Benoît Gérin et al.

Vision-Language Models for remote sensing have shown promising uses thanks to their extensive pretraining. However, their conventional usage in zero-shot scene classification methods still involves dividing large images into patches and making independent predictions, i.e., inductive inference, thereby limiting their effectiveness by ignoring valuable contextual information. Our approach tackles this issue by utilizing initial predictions based on text prompting and patch affinity relationships from the image encoder to enhance zero-shot capabilities through transductive inference, all without the need for supervision and at a minor computational cost. Experiments on 10 remote sensing datasets with state-of-the-art Vision-Language Models demonstrate significant accuracy improvements over inductive zero-shot classification. Our source code is publicly available on Github: https://github.com/elkhouryk/RS-TransCLIP

CVNov 18, 2022Code
Mixture Domain Adaptation to Improve Semantic Segmentation in Real-World Surveillance

Sébastien Piérard, Anthony Cioppa, Anaïs Halin et al.

Various tasks encountered in real-world surveillance can be addressed by determining posteriors (e.g. by Bayesian inference or machine learning), based on which critical decisions must be taken. However, the surveillance domain (acquisition device, operating conditions, etc.) is often unknown, which prevents any possibility of scene-specific optimization. In this paper, we define a probabilistic framework and present a formal proof of an algorithm for the unsupervised many-to-infinity domain adaptation of posteriors. Our proposed algorithm is applicable when the probability measure associated with the target domain is a convex combination of the probability measures of the source domains. It makes use of source models and a domain discriminator model trained off-line to compute posteriors adapted on the fly to the target domain. Finally, we show the effectiveness of our algorithm for the task of semantic segmentation in real-world surveillance. The code is publicly available at https://github.com/rvandeghen/MDA.

APMar 20, 2019
Optimal Intermittent Measurements for Tumor Tracking in X-ray Guided Radiotherapy

Antoine Aspeel, Damien Dasnoy, Raphaël M. Jungers et al.

In radiation therapy, tumor tracking is a challenging task that allows a better dose delivery. One practice is to acquire X-ray images in real-time during treatment, that are used to estimate the tumor location. These informations are used to predict the close future tumor trajectory. Kalman prediction is a classical approach for this task. The main drawback of X-ray acquisition is that it irradiates the patient, including its healthy tissues. In the classical Kalman framework, X-ray measurements are taken regularly, i.e. at a constant rate. In this paper, we propose a new approach which relaxes this constraint in order to take measurements when they are the most useful. Our aim is for a given budget of measurements to optimize the tracking process. This idea naturally brings to an optimal intermittent Kalman predictor for which measurement times are selected to minimize the mean squared prediction error over the complete fraction. This optimization problem can be solved directly when the respiratory model has been identified and the optimal sampling times can be computed at once. These optimal measurement times are obtained by solving a combinatorial optimization problem using a genetic algorithm. We created a test benchmark on trajectories validated on one patient. This new prediction method is compared to the regular Kalman predictor and a relative improvement of 9:8% is observed on the root mean square position estimation error.

IVJul 11, 2022
Forward Error Correction applied to JPEG-XS codestreams

Antoine Legrand, Benoît Macq, Christophe De Vleeschouwer

JPEG-XS offers low complexity image compression for applications with constrained but reasonable bit-rate, and low latency. Our paper explores the deployment of JPEG-XS on lossy packet networks. To preserve low latency, Forward Error Correction (FEC) is envisioned as the protection mechanism of interest. Despite the JPEG-XS codestream is not scalable in essence, we observe that the loss of a codestream fraction impacts the decoded image quality differently, depending on whether this codestream fraction corresponds to codestream headers, to coefficients significance information, or to low/high frequency data, respectively. Hence, we propose a rate-distortion optimal unequal error protection scheme that adapts the redundancy level of Reed-Solomon codes according to the rate of channel losses and the type of information protected by the code. Our experiments demonstrate that, at 5% loss rates, it reduces the Mean Squared Error by up to 92% and 65%, compared to a transmission without and with optimal but equal protection, respectively.

LGOct 4, 2025Code
Optimizing Resources for On-the-Fly Label Estimation with Multiple Unknown Medical Experts

Tim Bary, Tiffanie Godelaine, Axel Abels et al.

Accurate ground truth estimation in medical screening programs often relies on coalitions of experts and peer second opinions. Algorithms that efficiently aggregate noisy annotations can enhance screening workflows, particularly when data arrive continuously and expert proficiency is initially unknown. However, existing algorithms do not meet the requirements for seamless integration into screening pipelines. We therefore propose an adaptive approach for real-time annotation that (I) supports on-the-fly labeling of incoming data, (II) operates without prior knowledge of medical experts or pre-labeled data, and (III) dynamically queries additional experts based on the latent difficulty of each instance. The method incrementally gathers expert opinions until a confidence threshold is met, providing accurate labels with reduced annotation overhead. We evaluate our approach on three multi-annotator classification datasets across different modalities. Results show that our adaptive querying strategy reduces the number of expert queries by up to 50% while achieving accuracy comparable to a non-adaptive baseline. Our code is available at https://github.com/tbary/MEDICS

CVApr 27, 2024
Multi-Stream Cellular Test-Time Adaptation of Real-Time Models Evolving in Dynamic Environments

Benoît Gérin, Anaïs Halin, Anthony Cioppa et al.

In the era of the Internet of Things (IoT), objects connect through a dynamic network, empowered by technologies like 5G, enabling real-time data sharing. However, smart objects, notably autonomous vehicles, face challenges in critical local computations due to limited resources. Lightweight AI models offer a solution but struggle with diverse data distributions. To address this limitation, we propose a novel Multi-Stream Cellular Test-Time Adaptation (MSC-TTA) setup where models adapt on the fly to a dynamic environment divided into cells. Then, we propose a real-time adaptive student-teacher method that leverages the multiple streams available in each cell to quickly adapt to changing data distributions. We validate our methodology in the context of autonomous vehicles navigating across cells defined based on location and weather conditions. To facilitate future benchmarking, we release a new multi-stream large-scale synthetic semantic segmentation dataset, called DADE, and show that our multi-stream approach outperforms a single-stream baseline. We believe that our work will open research opportunities in the IoT and 5G eras, offering solutions for real-time model adaptation.

LGSep 16, 2025
No Need for Learning to Defer? A Training Free Deferral Framework to Multiple Experts through Conformal Prediction

Tim Bary, Benoît Macq, Louis Petit

AI systems often fail to deliver reliable predictions across all inputs, prompting the need for hybrid human-AI decision-making. Existing Learning to Defer (L2D) approaches address this by training deferral models, but these are sensitive to changes in expert composition and require significant retraining if experts change. We propose a training-free, model- and expert-agnostic framework for expert deferral based on conformal prediction. Our method uses the prediction set generated by a conformal predictor to identify label-specific uncertainty and selects the most discriminative expert using a segregativity criterion, measuring how well an expert distinguishes between the remaining plausible labels. Experiments on CIFAR10-H and ImageNet16-H show that our method consistently outperforms both the standalone model and the strongest expert, with accuracies attaining $99.57\pm0.10\%$ and $99.40\pm0.52\%$, while reducing expert workload by up to a factor of $11$. The method remains robust under degraded expert performance and shows a gradual performance drop in low-information settings. These results suggest a scalable, retraining-free alternative to L2D for real-world human-AI collaboration.

CVNov 22, 2024
Physically Interpretable Probabilistic Domain Characterization

Anaïs Halin, Sébastien Piérard, Renaud Vandeghen et al.

Characterizing domains is essential for models analyzing dynamic environments, as it allows them to adapt to evolving conditions or to hand the task over to backup systems when facing conditions outside their operational domain. Existing solutions typically characterize a domain by solving a regression or classification problem, which limits their applicability as they only provide a limited summarized description of the domain. In this paper, we present a novel approach to domain characterization by characterizing domains as probability distributions. Particularly, we develop a method to predict the likelihood of different weather conditions from images captured by vehicle-mounted cameras by estimating distributions of physical parameters using normalizing flows. To validate our proposed approach, we conduct experiments within the context of autonomous vehicles, focusing on predicting the distribution of weather parameters to characterize the operational domain. This domain is characterized by physical parameters (absolute characterization) and arbitrarily predefined domains (relative characterization). Finally, we evaluate whether a system can safely operate in a target domain by comparing it to multiple source domains where safety has already been established. This approach holds significant potential, as accurate weather prediction and effective domain adaptation are crucial for autonomous systems to adjust to dynamic environmental conditions.

CVSep 24, 2025
Data-Efficient Stream-Based Active Distillation for Scalable Edge Model Deployment

Dani Manjah, Tim Bary, Benoît Gérin et al.

Edge camera-based systems are continuously expanding, facing ever-evolving environments that require regular model updates. In practice, complex teacher models are run on a central server to annotate data, which is then used to train smaller models tailored to the edge devices with limited computational power. This work explores how to select the most useful images for training to maximize model quality while keeping transmission costs low. Our work shows that, for a similar training load (i.e., iterations), a high-confidence stream-based strategy coupled with a diversity-based approach produces a high-quality model with minimal dataset queries.

CVJul 14, 2025
Leveraging Swin Transformer for enhanced diagnosis of Alzheimer's disease using multi-shell diffusion MRI

Quentin Dessain, Nicolas Delinte, Bernard Hanseeuw et al.

Objective: This study aims to support early diagnosis of Alzheimer's disease and detection of amyloid accumulation by leveraging the microstructural information available in multi-shell diffusion MRI (dMRI) data, using a vision transformer-based deep learning framework. Methods: We present a classification pipeline that employs the Swin Transformer, a hierarchical vision transformer model, on multi-shell dMRI data for the classification of Alzheimer's disease and amyloid presence. Key metrics from DTI and NODDI were extracted and projected onto 2D planes to enable transfer learning with ImageNet-pretrained models. To efficiently adapt the transformer to limited labeled neuroimaging data, we integrated Low-Rank Adaptation. We assessed the framework on diagnostic group prediction (cognitively normal, mild cognitive impairment, Alzheimer's disease dementia) and amyloid status classification. Results: The framework achieved competitive classification results within the scope of multi-shell dMRI-based features, with the best balanced accuracy of 95.2% for distinguishing cognitively normal individuals from those with Alzheimer's disease dementia using NODDI metrics. For amyloid detection, it reached 77.2% balanced accuracy in distinguishing amyloid-positive mild cognitive impairment/Alzheimer's disease dementia subjects from amyloid-negative cognitively normal subjects, and 67.9% for identifying amyloid-positive individuals among cognitively normal subjects. Grad-CAM-based explainability analysis identified clinically relevant brain regions, including the parahippocampal gyrus and hippocampus, as key contributors to model predictions. Conclusion: This study demonstrates the promise of diffusion MRI and transformer-based architectures for early detection of Alzheimer's disease and amyloid pathology, supporting biomarker-driven diagnostics in data-limited biomedical settings.

IVNov 22, 2024
Exploring Foundation Models Fine-Tuning for Cytology Classification

Manon Dausort, Tiffanie Godelaine, Maxime Zanella et al.

Cytology slides are essential tools in diagnosing and staging cancer, but their analysis is time-consuming and costly. Foundation models have shown great potential to assist in these tasks. In this paper, we explore how existing foundation models can be applied to cytological classification. More particularly, we focus on low-rank adaptation, a parameter-efficient fine-tuning method suited to few-shot learning. We evaluated five foundation models across four cytological classification datasets. Our results demonstrate that fine-tuning the pre-trained backbones with LoRA significantly improves model performance compared to fine-tuning only the classifier head, achieving state-of-the-art results on both simple and complex classification tasks while requiring fewer data samples.

CVJun 16, 2015
Post-Reconstruction Deconvolution of PET Images by Total Generalized Variation Regularization

Stéphanie Guérit, Laurent Jacques, Benoît Macq et al.

Improving the quality of positron emission tomography (PET) images, affected by low resolution and high level of noise, is a challenging task in nuclear medicine and radiotherapy. This work proposes a restoration method, achieved after tomographic reconstruction of the images and targeting clinical situations where raw data are often not accessible. Based on inverse problem methods, our contribution introduces the recently developed total generalized variation (TGV) norm to regularize PET image deconvolution. Moreover, we stabilize this procedure with additional image constraints such as positivity and photometry invariance. A criterion for updating and adjusting automatically the regularization parameter in case of Poisson noise is also presented. Experiments are conducted on both synthetic data and real patient images.