Pietro Gori

CV
h-index38
43papers
559citations
Novelty50%
AI Score58

43 Papers

51.6CVJun 4
Diff-CA: Separating Common and Salient Factors with Diffusion Models

Michaël Soumm, Alexandre Fournier Montgieux, Yunlong He et al.

Contrastive Analysis aims to separate factors that are common between two data distributions from those that are salient to only one of them. Existing contrastive methods are based on generative models (e.g., VAEs or GANs) that often suffer from limited reconstruction and image quality, which hampers effective latent factor separation and limits their applicability to high-fidelity image generation and edition. We propose a novel conditioning framework for diffusion models that enables contrastive decomposition without compromising generation quality. We first train a prompt-free, image-conditioned diffusion model, and then learn to decompose the conditioning into a common and a salient factor, using weak supervision. We prove that the additive contrastive factorization, commonly assumed in prior work, is identifiable under mild conditions. This factorization enables targeted operations by swapping or interpolating only the salient factor.

CVJul 12, 2023Code
SepVAE: a contrastive VAE to separate pathological patterns from healthy ones

Robin Louiset, Edouard Duchesnay, Antoine Grigis et al.

Contrastive Analysis VAE (CA-VAEs) is a family of Variational auto-encoders (VAEs) that aims at separating the common factors of variation between a background dataset (BG) (i.e., healthy subjects) and a target dataset (TG) (i.e., patients) from the ones that only exist in the target dataset. To do so, these methods separate the latent space into a set of salient features (i.e., proper to the target dataset) and a set of common features (i.e., exist in both datasets). Currently, all models fail to prevent the sharing of information between latent spaces effectively and to capture all salient factors of variation. To this end, we introduce two crucial regularization losses: a disentangling term between common and salient representations and a classification term between background and target samples in the salient space. We show a better performance than previous CA-VAEs methods on three medical applications and a natural images dataset (CelebA). Code and datasets are available on GitHub https://github.com/neurospin-projects/2023_rlouiset_sepvae.

CVAug 18, 2023Code
Decoupled conditional contrastive learning with variable metadata for prostate lesion detection

Camille Ruppli, Pietro Gori, Roberto Ardon et al.

Early diagnosis of prostate cancer is crucial for efficient treatment. Multi-parametric Magnetic Resonance Images (mp-MRI) are widely used for lesion detection. The Prostate Imaging Reporting and Data System (PI-RADS) has standardized interpretation of prostate MRI by defining a score for lesion malignancy. PI-RADS data is readily available from radiology reports but is subject to high inter-reports variability. We propose a new contrastive loss function that leverages weak metadata with multiple annotators per sample and takes advantage of inter-reports variability by defining metadata confidence. By combining metadata of varying confidence with unannotated data into a single conditional contrastive loss function, we report a 3% AUC increase on lesion detection on the public PI-CAI challenge dataset. Code is available at: https://github.com/camilleruppli/decoupled_ccl

CVJul 10, 2023Code
Weakly-supervised positional contrastive learning: application to cirrhosis classification

Emma Sarfati, Alexandre Bône, Marc-Michel Rohé et al.

Large medical imaging datasets can be cheaply and quickly annotated with low-confidence, weak labels (e.g., radiological scores). Access to high-confidence labels, such as histology-based diagnoses, is rare and costly. Pretraining strategies, like contrastive learning (CL) methods, can leverage unlabeled or weakly-annotated datasets. These methods typically require large batch sizes, which poses a difficulty in the case of large 3D images at full resolution, due to limited GPU memory. Nevertheless, volumetric positional information about the spatial context of each 2D slice can be very important for some medical applications. In this work, we propose an efficient weakly-supervised positional (WSP) contrastive learning strategy where we integrate both the spatial context of each 2D slice and a weak label via a generic kernel-based loss function. We illustrate our method on cirrhosis prediction using a large volume of weakly-labeled images, namely radiological low-confidence annotations, and small strongly-labeled (i.e., high-confidence) datasets. The proposed model improves the classification AUC by 5% with respect to a baseline model on our internal dataset, and by 26% on the public LIHC dataset from the Cancer Genome Atlas. The code is available at: https://github.com/Guerbet-AI/wsp-contrastive.

62.7LGMay 20Code
Automatic Discovery of Disease Subgroups by Contrasting with Healthy Controls

Robin Louiset, Edouard Duchesnay, Benoit Dufumier et al.

In biomedical Subgroup Discovery, practitioners are interested in discovering interpretable and homogeneous subgroups within a group of patients. In this paper, assuming that healthy subjects (i.e., controls) share common but irrelevant factors of variation with the patients, we motivate and develop a Contrastive Subgroup Discovery method, entitled Deep UCSL. By contrasting patients with controls, Deep UCSL identifies subgroups driven solely by pathological factors, ignoring common variability shared with healthy subjects. Our framework employs a deep feature extractor to learn a discriminative representation space. Mathematically, we derive a novel loss based on the conditional joint likelihood of latent clusters and patient/control labels, optimized via an Expectation-Maximization strategy alternating between subgroup inference and feature encoder updates. A regularization term further encourages representations to capture disease-specific variability while ignoring variability shared with controls. Compared to previous related works, our approach quantitatively improves the quality of the estimated subgroups, as demonstrated on a MNIST example and four distinct real medical imaging datasets. Code and datasets are available at: https://github.com/rlouiset/deep_ucsl.

LGNov 10, 2022
Unbiased Supervised Contrastive Learning

Carlo Alberto Barbano, Benoit Dufumier, Enzo Tartaglione et al.

Many datasets are biased, namely they contain easy-to-learn features that are highly correlated with the target class only in the dataset but not in the true underlying distribution of the data. For this reason, learning unbiased models from biased data has become a very relevant research topic in the last years. In this work, we tackle the problem of learning representations that are robust to biases. We first present a margin-based theoretical framework that allows us to clarify why recent contrastive losses (InfoNCE, SupCon, etc.) can fail when dealing with biased data. Based on that, we derive a novel formulation of the supervised contrastive loss (epsilon-SupInfoNCE), providing more accurate control of the minimal distance between positive and negative samples. Furthermore, thanks to our theoretical framework, we also propose FairKL, a new debiasing regularization loss, that works well even with extremely biased data. We validate the proposed losses on standard vision datasets including CIFAR10, CIFAR100, and ImageNet, and we assess the debiasing capability of FairKL with epsilon-SupInfoNCE, reaching state-of-the-art performance on a number of biased datasets, including real instances of biases in the wild.

IVOct 4, 2022
Anatomically constrained CT image translation for heterogeneous blood vessel segmentation

Giammarco La Barbera, Haithem Boussaid, Francesco Maso et al.

Anatomical structures such as blood vessels in contrast-enhanced CT (ceCT) images can be challenging to segment due to the variability in contrast medium diffusion. The combined use of ceCT and contrast-free (CT) CT images can improve the segmentation performances, but at the cost of a double radiation exposure. To limit the radiation dose, generative models could be used to synthesize one modality, instead of acquiring it. The CycleGAN approach has recently attracted particular attention because it alleviates the need for paired data that are difficult to obtain. Despite the great performances demonstrated in the literature, limitations still remain when dealing with 3D volumes generated slice by slice from unpaired datasets with different fields of view. We present an extension of CycleGAN to generate high fidelity images, with good structural consistency, in this context. We leverage anatomical constraints and automatic region of interest selection by adapting the Self-Supervised Body Regressor. These constraints enforce anatomical consistency and allow feeding anatomically-paired input images to the algorithm. Results show qualitative and quantitative improvements, compared to stateof-the-art methods, on the translation task between ceCT and CT images (and vice versa).

IVNov 14, 2022
Contrastive learning for regression in multi-site brain age prediction

Carlo Alberto Barbano, Benoit Dufumier, Edouard Duchesnay et al.

Building accurate Deep Learning (DL) models for brain age prediction is a very relevant topic in neuroimaging, as it could help better understand neurodegenerative disorders and find new biomarkers. To estimate accurate and generalizable models, large datasets have been collected, which are often multi-site and multi-scanner. This large heterogeneity negatively affects the generalization performance of DL models since they are prone to overfit site-related noise. Recently, contrastive learning approaches have been shown to be more robust against noise in data or labels. For this reason, we propose a novel contrastive learning regression loss for robust brain age prediction using MRI scans. Our method achieves state-of-the-art performance on the OpenBHB challenge, yielding the best generalization capability and robustness to site-related noise.

IVAug 22, 2023
Non-Redundant Combination of Hand-Crafted and Deep Learning Radiomics: Application to the Early Detection of Pancreatic Cancer

Rebeca Vétil, Clément Abi-Nader, Alexandre Bône et al.

We address the problem of learning Deep Learning Radiomics (DLR) that are not redundant with Hand-Crafted Radiomics (HCR). To do so, we extract DLR features using a VAE while enforcing their independence with HCR features by minimizing their mutual information. The resulting DLR features can be combined with hand-crafted ones and leveraged by a classifier to predict early markers of cancer. We illustrate our method on four early markers of pancreatic cancer and validate it on a large independent test set. Our results highlight the value of combining non-redundant DLR and HCR features, as evidenced by an improvement in the Area Under the Curve compared to baseline methods that do not address redundancy or solely rely on HCR features.

CVOct 21, 2022
Learning shape distributions from large databases of healthy organs: applications to zero-shot and few-shot abnormal pancreas detection

Rebeca Vétil, Clément Abi Nader, Alexandre Bône et al.

We propose a scalable and data-driven approach to learn shape distributions from large databases of healthy organs. To do so, volumetric segmentation masks are embedded into a common probabilistic shape space that is learned with a variational auto-encoding network. The resulting latent shape representations are leveraged to derive zeroshot and few-shot methods for abnormal shape detection. The proposed distribution learning approach is illustrated on a large database of 1200 healthy pancreas shapes. Downstream qualitative and quantitative experiments are conducted on a separate test set of 224 pancreas from patients with mixed conditions. The abnormal pancreas detection AUC reached up to 65.41% in the zero-shot configuration, and 78.97% in the few-shot configuration with as few as 15 abnormal examples, outperforming a baseline approach based on the sole volume.

CVJun 3, 2022
Integrating Prior Knowledge in Contrastive Learning with Kernel

Benoit Dufumier, Carlo Alberto Barbano, Robin Louiset et al.

Data augmentation is a crucial component in unsupervised contrastive learning (CL). It determines how positive samples are defined and, ultimately, the quality of the learned representation. In this work, we open the door to new perspectives for CL by integrating prior knowledge, given either by generative models -- viewed as prior representations -- or weak attributes in the positive and negative sampling. To this end, we use kernel theory to propose a novel loss, called decoupled uniformity, that i) allows the integration of prior knowledge and ii) removes the negative-positive coupling in the original InfoNCE loss. We draw a connection between contrastive learning and conditional mean embedding theory to derive tight bounds on the downstream classification loss. In an unsupervised setting, we empirically demonstrate that CL benefits from generative models to improve its representation both on natural and medical images. In a weakly supervised scenario, our framework outperforms other unconditional and conditional CL approaches.

CVFeb 16, 2023
Learning to diagnose cirrhosis from radiological and histological labels with joint self and weakly-supervised pretraining strategies

Emma Sarfati, Alexandre Bone, Marc-Michel Rohe et al.

Identifying cirrhosis is key to correctly assess the health of the liver. However, the gold standard diagnosis of the cirrhosis needs a medical intervention to obtain the histological confirmation, e.g. the METAVIR score, as the radiological presentation can be equivocal. In this work, we propose to leverage transfer learning from large datasets annotated by radiologists, which we consider as a weak annotation, to predict the histological score available on a small annex dataset. To this end, we propose to compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis. Finally, we introduce a loss function combining both supervised and self-supervised frameworks for pretraining. This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75, compared to 0.77 and 0.72 for a baseline classifier.

60.6CVMay 12Code
Contrastive Learning under Noisy Temporal Self-Supervision for Colonoscopy Videos

Luca Parolari, Pietro Gori, Lamberto Ballan et al.

Learning robust representations of polyp tracklets is key to enabling multiple AI-assisted colonoscopy applications, from polyp characterization to automated reporting and retrieval. Supervised contrastive learning is an effective approach for learning such representations, but it typically relies on correct positive and negative definitions. Collecting these labels requires linking tracklets that depict the same underlying polyp entity throughout the video, which is costly and demands specialized clinical expertise. In this work, we leverage the sequential workflow of colonoscopy procedures to derive self-supervised associations from temporal structure. Since temporally derived associations are not guaranteed to be correct, we introduce a noise-aware contrastive loss to account for noisy associations. We demonstrate the effectiveness of the learned representations across multiple downstream tasks, including polyp retrieval and re-identification, size estimation, and histology classification. Our method outperforms prior self-supervised and supervised baselines, and matches or exceeds recent foundation models across all tasks, using a lightweight encoder trained on only 27 videos. Code is available at https://github.com/lparolari/ntssl.

CVMay 12, 2022
Real-time Virtual-Try-On from a Single Example Image through Deep Inverse Graphics and Learned Differentiable Renderers

Robin Kips, Ruowei Jiang, Sileye Ba et al.

Augmented reality applications have rapidly spread across online platforms, allowing consumers to virtually try-on a variety of products, such as makeup, hair dying, or shoes. However, parametrizing a renderer to synthesize realistic images of a given product remains a challenging task that requires expert knowledge. While recent work has introduced neural rendering methods for virtual try-on from example images, current approaches are based on large generative models that cannot be used in real-time on mobile devices. This calls for a hybrid method that combines the advantages of computer graphics and neural rendering approaches. In this paper we propose a novel framework based on deep learning to build a real-time inverse graphics encoder that learns to map a single example image into the parameter space of a given augmented reality rendering engine. Our method leverages self-supervised learning and does not require labeled training data which makes it extendable to many virtual try-on applications. Furthermore, most augmented reality renderers are not differentiable in practice due to algorithmic choices or implementation constraints to reach real-time on portable devices. To relax the need for a graphics-based differentiable renderer in inverse graphics problems, we introduce a trainable imitator module. Our imitator is a generative network that learns to accurately reproduce the behavior of a given non-differentiable renderer. We propose a novel rendering sensitivity loss to train the imitator, which ensures that the network learns an accurate and continuous representation for each rendering parameter. Our framework enables novel applications where consumers can virtually try-on a novel unknown product from an inspirational reference image on social media. It can also be used by graphics artists to automatically create realistic rendering from a reference product image.

LGJul 27, 2022
Optimizing transformations for contrastive learning in a differentiable framework

Camille Ruppli, Pietro Gori, Roberto Ardon et al.

Current contrastive learning methods use random transformations sampled from a large list of transformations, with fixed hyperparameters, to learn invariance from an unannotated database. Following previous works that introduce a small amount of supervision, we propose a framework to find optimal transformations for contrastive learning using a differentiable transformation network. Our method increases performances at low annotated data regime both in supervision accuracy and in convergence speed. In contrast to previous work, no generative model is needed for transformation optimization. Transformed images keep relevant information to solve the supervised task, here classification. Experiments were performed on 34000 2D slices of brain Magnetic Resonance Images and 11200 chest X-ray images. On both datasets, with 10% of labeled data, our model achieves better performances than a fully supervised model with 100% labels.

CVJul 6, 2022
Is the U-Net Directional-Relationship Aware?

Mateus Riva, Pietro Gori, Florian Yger et al.

CNNs are often assumed to be capable of using contextual information about distinct objects (such as their directional relations) inside their receptive field. However, the nature and limits of this capacity has never been explored in full. We explore a specific type of relationship~-- directional~-- using a standard U-Net trained to optimize a cross-entropy loss function for segmentation. We train this network on a pretext segmentation task requiring directional relation reasoning for success and state that, with enough data and a sufficiently large receptive field, it succeeds to learn the proposed task. We further explore what the network has learned by analysing scenarios where the directional relationships are perturbed, and show that the network has learned to reason using these relationships.

13.9CVMar 10
Unsupervised Domain Adaptation with Target-Only Margin Disparity Discrepancy

Gauthier Miralles, Loïc Le Folgoc, Vincent Jugnon et al.

In interventional radiology, Cone-Beam Computed Tomography (CBCT) is a helpful imaging modality that provides guidance to practicians during minimally invasive procedures. CBCT differs from traditional Computed Tomography (CT) due to its limited reconstructed field of view, specific artefacts, and the intra-arterial administration of contrast medium. While CT benefits from abundant publicly available annotated datasets, interventional CBCT data remain scarce and largely unannotated, with existing datasets focused primarily on radiotherapy applications. To address this limitation, we leverage a proprietary collection of unannotated interventional CBCT scans in conjunction with annotated CT data, employing domain adaptation techniques to bridge the modality gap and enhance liver segmentation performance on CBCT. We propose a novel unsupervised domain adaptation (UDA) framework based on the formalism of Margin Disparity Discrepancy (MDD), which improves target domain performance through a reformulation of the original MDD optimization framework. Experimental results on CT and CBCT datasets for liver segmentation demonstrate that our method achieves state-of-the-art performance in UDA, as well as in the few-shot setting.

CVFeb 19, 2024Code
Separating common from salient patterns with Contrastive Representation Learning

Robin Louiset, Edouard Duchesnay, Antoine Grigis et al.

Contrastive Analysis is a sub-field of Representation Learning that aims at separating common factors of variation between two datasets, a background (i.e., healthy subjects) and a target (i.e., diseased subjects), from the salient factors of variation, only present in the target dataset. Despite their relevance, current models based on Variational Auto-Encoders have shown poor performance in learning semantically-expressive representations. On the other hand, Contrastive Representation Learning has shown tremendous performance leaps in various applications (classification, clustering, etc.). In this work, we propose to leverage the ability of Contrastive Learning to learn semantically expressive representations well adapted for Contrastive Analysis. We reformulate it under the lens of the InfoMax Principle and identify two Mutual Information terms to maximize and one to minimize. We decompose the first two terms into an Alignment and a Uniformity term, as commonly done in Contrastive Learning. Then, we motivate a novel Mutual Information minimization strategy to prevent information leakage between common and salient distributions. We validate our method, called SepCLR, on three visual datasets and three medical datasets, specifically conceived to assess the pattern separation capability in Contrastive Analysis. Code available at https://github.com/neurospin-projects/2024_rlouiset_sep_clr.

MLJul 5, 2021Code
UCSL : A Machine Learning Expectation-Maximization framework for Unsupervised Clustering driven by Supervised Learning

Robin Louiset, Pietro Gori, Benoit Dufumier et al.

Subtype Discovery consists in finding interpretable and consistent sub-parts of a dataset, which are also relevant to a certain supervised task. From a mathematical point of view, this can be defined as a clustering task driven by supervised learning in order to uncover subgroups in line with the supervised prediction. In this paper, we propose a general Expectation-Maximization ensemble framework entitled UCSL (Unsupervised Clustering driven by Supervised Learning). Our method is generic, it can integrate any clustering method and can be driven by both binary classification and regression. We propose to construct a non-linear model by merging multiple linear estimators, one per cluster. Each hyperplane is estimated so that it correctly discriminates - or predict - only one cluster. We use SVC or Logistic Regression for classification and SVR for regression. Furthermore, to perform cluster analysis within a more suitable space, we also propose a dimension-reduction algorithm that projects the data onto an orthonormal space relevant to the supervised task. We analyze the robustness and generalization capability of our algorithm using synthetic and experimental datasets. In particular, we validate its ability to identify suitable consistent sub-types by conducting a psychiatric-diseases cluster analysis with known ground-truth labels. The gain of the proposed method over previous state-of-the-art techniques is about +1.9 points in terms of balanced accuracy. Finally, we make codes and examples available in a scikit-learn-compatible Python package at https://github.com/neurospin-projects/2021_rlouiset_ucsl

CVJun 16, 2021Code
Metamorphic image registration using a semi-Lagrangian scheme

Anton François, Pietro Gori, Joan Glaunès

In this paper, we propose an implementation of both Large Deformation Diffeomorphic Metric Mapping (LDDMM) and Metamorphosis image registration using a semi-Lagrangian scheme for geodesic shooting. We propose to solve both problems as an inexact matching providing a single and unifying cost function. We demonstrate that for image registration the use of a semi-Lagrangian scheme is more stable than a standard Eulerian scheme. Our GPU implementation is based on PyTorch, which greatly simplifies and accelerates the computations thanks to its powerful automatic differentiation engine. It will be freely available at https://github.com/antonfrancois/Demeter_metamorphosis.

25.7CVMar 10
Leveraging whole slide difficulty in Multiple Instance Learning to improve prostate cancer grading

Marie Arrivat, Rémy Peyret, Elsa Angelini et al.

Multiple Instance Learning (MIL) has been widely applied in histopathology to classify Whole Slide Images (WSIs) with slide-level diagnoses. While the ground truth is established by expert pathologists, the slides can be difficult to diagnose for non-experts and lead to disagreements between the annotators. In this paper, we introduce the notion of Whole Slide Difficulty (WSD), based on the disagreement between an expert and a non-expert pathologist. We propose two different methods to leverage WSD, a multi-task approach and a weighted classification loss approach, and we apply them to Gleason grading of prostate cancer slides. Results show that integrating WSD during training consistently improves the classification performance across different feature encoders and MIL methods, particularly for higher Gleason grades (i.e. worse diagnosis).

CVJan 31, 2024
Double InfoGAN for Contrastive Analysis

Florence Carton, Robin Louiset, Pietro Gori

Contrastive Analysis (CA) deals with the discovery of what is common and what is distinctive of a target domain compared to a background one. This is of great interest in many applications, such as medical imaging. Current state-of-the-art (SOTA) methods are latent variable models based on VAE (CA-VAEs). However, they all either ignore important constraints or they don't enforce fundamental assumptions. This may lead to sub-optimal solutions where distinctive factors are mistaken for common ones (or viceversa). Furthermore, the generated images have a rather poor quality, typical of VAEs, decreasing their interpretability and usefulness. Here, we propose Double InfoGAN, the first GAN based method for CA that leverages the high-quality synthesis of GAN and the separation power of InfoGAN. Experimental results on four visual datasets, from simple synthetic examples to complex medical images, show that the proposed method outperforms SOTA CA-VAEs in terms of latent separation and image quality. Datasets and code are available online.

CVMay 2, 2025
Self-Supervision Enhances Instance-based Multiple Instance Learning Methods in Digital Pathology: A Benchmark Study

Ali Mammadov, Loic Le Folgoc, Julien Adam et al.

Multiple Instance Learning (MIL) has emerged as the best solution for Whole Slide Image (WSI) classification. It consists of dividing each slide into patches, which are treated as a bag of instances labeled with a global label. MIL includes two main approaches: instance-based and embedding-based. In the former, each patch is classified independently, and then the patch scores are aggregated to predict the bag label. In the latter, bag classification is performed after aggregating patch embeddings. Even if instance-based methods are naturally more interpretable, embedding-based MILs have usually been preferred in the past due to their robustness to poor feature extractors. However, recently, the quality of feature embeddings has drastically increased using self-supervised learning (SSL). Nevertheless, many authors continue to endorse the superiority of embedding-based MIL. To investigate this further, we conduct 710 experiments across 4 datasets, comparing 10 MIL strategies, 6 self-supervised methods with 4 backbones, 4 foundation models, and various pathology-adapted techniques. Furthermore, we introduce 4 instance-based MIL methods never used before in the pathology domain. Through these extensive experiments, we show that with a good SSL feature extractor, simple instance-based MILs, with very few parameters, obtain similar or better performance than complex, state-of-the-art (SOTA) embedding-based MIL methods, setting new SOTA results on the BRACS and Camelyon16 datasets. Since simple instance-based MIL methods are naturally more interpretable and explainable to clinicians, our results suggest that more effort should be put into well-adapted SSL methods for WSI rather than into complex embedding-based MIL methods.

AIApr 30, 2025
First Order Logic with Fuzzy Semantics for Describing and Recognizing Nerves in Medical Images

Isabelle Bloch, Enzo Bonnot, Pietro Gori et al.

This article deals with the description and recognition of fiber bundles, in particular nerves, in medical images, based on the anatomical description of the fiber trajectories. To this end, we propose a logical formalization of this anatomical knowledge. The intrinsically imprecise description of nerves, as found in anatomical textbooks, leads us to propose fuzzy semantics combined with first-order logic. We define a language representing spatial entities, relations between these entities and quantifiers. A formula in this language is then a formalization of the natural language description. The semantics are given by fuzzy representations in a concrete domain and satisfaction degrees of relations. Based on this formalization, a spatial reasoning algorithm is proposed for segmentation and recognition of nerves from anatomical and diffusion magnetic resonance images, which is illustrated on pelvic nerves in pediatric imaging, enabling surgeons to plan surgery.

CVDec 14, 2025
Learning Common and Salient Generative Factors Between Two Image Datasets

Yunlong He, Gwilherm Lesné, Ziqian Liu et al.

Recent advancements in image synthesis have enabled high-quality image generation and manipulation. Most works focus on: 1) conditional manipulation, where an image is modified conditioned on a given attribute, or 2) disentangled representation learning, where each latent direction should represent a distinct semantic attribute. In this paper, we focus on a different and less studied research problem, called Contrastive Analysis (CA). Given two image datasets, we want to separate the common generative factors, shared across the two datasets, from the salient ones, specific to only one dataset. Compared to existing methods, which use attributes as supervised signals for editing (e.g., glasses, gender), the proposed method is weaker, since it only uses the dataset signal. We propose a novel framework for CA, that can be adapted to both GAN and Diffusion models, to learn both common and salient factors. By defining new and well-adapted learning strategies and losses, we ensure a relevant separation between common and salient factors, preserving a high-quality generation. We evaluate our approach on diverse datasets, covering human faces, animal images and medical scans. Our framework demonstrates superior separation ability and image quality synthesis compared to prior methods.

CVSep 18, 2025
Visionerves: Automatic and Reproducible Hybrid AI for Peripheral Nervous System Recognition Applied to Endometriosis Cases

Giammarco La Barbera, Enzo Bonnot, Thomas Isla et al.

Endometriosis often leads to chronic pelvic pain and possible nerve involvement, yet imaging the peripheral nerves remains a challenge. We introduce Visionerves, a novel hybrid AI framework for peripheral nervous system recognition from multi-gradient DWI and morphological MRI data. Unlike conventional tractography, Visionerves encodes anatomical knowledge through fuzzy spatial relationships, removing the need for selection of manual ROIs. The pipeline comprises two phases: (A) automatic segmentation of anatomical structures using a deep learning model, and (B) tractography and nerve recognition by symbolic spatial reasoning. Applied to the lumbosacral plexus in 10 women with (confirmed or suspected) endometriosis, Visionerves demonstrated substantial improvements over standard tractography, with Dice score improvements of up to 25% and spatial errors reduced to less than 5 mm. This automatic and reproducible approach enables detailed nerve analysis and paves the way for non-invasive diagnosis of endometriosis-related neuropathy, as well as other conditions with nerve involvement.

IVJul 2, 2025
Robust brain age estimation from structural MRI with contrastive learning

Carlo Alberto Barbano, Benoit Dufumier, Edouard Duchesnay et al.

Estimating brain age from structural MRI has emerged as a powerful tool for characterizing normative and pathological aging. In this work, we explore contrastive learning as a scalable and robust alternative to supervised approaches for brain age estimation. We introduce a novel contrastive loss function, $\mathcal{L}^{exp}$, and evaluate it across multiple public neuroimaging datasets comprising over 20,000 scans. Our experiments reveal four key findings. First, scaling pre-training on diverse, multi-site data consistently improves generalization performance, cutting external mean absolute error (MAE) nearly in half. Second, $\mathcal{L}^{exp}$ is robust to site-related confounds, maintaining low scanner-predictability as training size increases. Third, contrastive models reliably capture accelerated aging in patients with cognitive impairment and Alzheimer's disease, as shown through brain age gap analysis, ROC curves, and longitudinal trends. Lastly, unlike supervised baselines, $\mathcal{L}^{exp}$ maintains a strong correlation between brain age accuracy and downstream diagnostic performance, supporting its potential as a foundation model for neuroimaging. These results position contrastive learning as a promising direction for building generalizable and clinically meaningful brain representations.

CVJun 30, 2025
Reducing Variability of Multiple Instance Learning Methods for Digital Pathology

Ali Mammadov, Loïc Le Folgoc, Guillaume Hocquet et al.

Digital pathology has revolutionized the field by enabling the digitization of tissue samples into whole slide images (WSIs). However, the high resolution and large size of WSIs present significant challenges when it comes to applying Deep Learning models. As a solution, WSIs are often divided into smaller patches with a global label (\textit{i.e., diagnostic}) per slide, instead of a (too) costly pixel-wise annotation. By treating each slide as a bag of patches, Multiple Instance Learning (MIL) methods have emerged as a suitable solution for WSI classification. A major drawback of MIL methods is their high variability in performance across different runs, which can reach up to 10-15 AUC points on the test set, making it difficult to compare different MIL methods reliably. This variability mainly comes from three factors: i) weight initialization, ii) batch (shuffling) ordering, iii) and learning rate. To address that, we introduce a Multi-Fidelity, Model Fusion strategy for MIL methods. We first train multiple models for a few epochs and average the most stable and promising ones based on validation scores. This approach can be applied to any existing MIL model to reduce performance variability. It also simplifies hyperparameter tuning and improves reproducibility while maintaining computational efficiency. We extensively validate our approach on WSI classification tasks using 2 different datasets, 3 initialization strategies and 5 MIL methods, for a total of more than 2000 experiments.

CVJun 30, 2025
Self-Supervised Multiview Xray Matching

Mohamad Dabboussi, Malo Huard, Yann Gousseau et al.

Accurate interpretation of multi-view radiographs is crucial for diagnosing fractures, muscular injuries, and other anomalies. While significant advances have been made in AI-based analysis of single images, current methods often struggle to establish robust correspondences between different X-ray views, an essential capability for precise clinical evaluations. In this work, we present a novel self-supervised pipeline that eliminates the need for manual annotation by automatically generating a many-to-many correspondence matrix between synthetic X-ray views. This is achieved using digitally reconstructed radiographs (DRR), which are automatically derived from unannotated CT volumes. Our approach incorporates a transformer-based training phase to accurately predict correspondences across two or more X-ray views. Furthermore, we demonstrate that learning correspondences among synthetic X-ray views can be leveraged as a pretraining strategy to enhance automatic multi-view fracture detection on real data. Extensive evaluations on both synthetic and real X-ray datasets show that incorporating correspondences improves performance in multi-view fracture classification.

GRFeb 8, 2022
Hair Color Digitization through Imaging and Deep Inverse Graphics

Robin Kips, Panagiotis-Alexandros Bokaris, Matthieu Perrot et al.

Hair appearance is a complex phenomenon due to hair geometry and how the light bounces on different hair fibers. For this reason, reproducing a specific hair color in a rendering environment is a challenging task that requires manual work and expert knowledge in computer graphics to tune the result visually. While current hair capture methods focus on hair shape estimation many applications could benefit from an automated method for capturing the appearance of a physical hair sample, from augmented/virtual reality to hair dying development. Building on recent advances in inverse graphics and material capture using deep neural networks, we introduce a novel method for hair color digitization. Our proposed pipeline allows capturing the color appearance of a physical hair sample and renders synthetic images of hair with a similar appearance, simulating different hair styles and/or lighting environments. Since rendering realistic hair images requires path-tracing rendering, the conventional inverse graphics approach based on differentiable rendering is untractable. Our method is based on the combination of a controlled imaging device, a path-tracing renderer, and an inverse graphics model based on self-supervised machine learning, which does not require to use differentiable rendering to be trained. We illustrate the performance of our hair digitization method on both real and synthetic images and show that our approach can accurately capture and render hair color.

IVFeb 1, 2022
A deep residual learning implementation of Metamorphosis

Matthis Maillard, Anton François, Joan Glaunès et al.

In medical imaging, most of the image registration methods implicitly assume a one-to-one correspondence between the source and target images (i.e., diffeomorphism). However, this is not necessarily the case when dealing with pathological medical images (e.g., presence of a tumor, lesion, etc.). To cope with this issue, the Metamorphosis model has been proposed. It modifies both the shape and the appearance of an image to deal with the geometrical and topological differences. However, the high computational time and load have hampered its applications so far. Here, we propose a deep residual learning implementation of Metamorphosis that drastically reduces the computational time at inference. Furthermore, we also show that the proposed framework can easily integrate prior knowledge of the localization of topological changes (e.g., segmentation masks) that can act as spatial regularization to correctly disentangle appearance and shape changes. We test our method on the BraTS 2021 dataset, showing that it outperforms current state-of-the-art methods in the alignment of images with brain tumors.

LGNov 10, 2021
Conditional Alignment and Uniformity for Contrastive Learning with Continuous Proxy Labels

Benoit Dufumier, Pietro Gori, Julie Victor et al.

Contrastive Learning has shown impressive results on natural and medical images, without requiring annotated data. However, a particularity of medical images is the availability of meta-data (such as age or sex) that can be exploited for learning representations. Here, we show that the recently proposed contrastive y-Aware InfoNCE loss, that integrates multi-dimensional meta-data, asymptotically optimizes two properties: conditional alignment and global uniformity. Similarly to [Wang, 2020], conditional alignment means that similar samples should have similar features, but conditionally on the meta-data. Instead, global uniformity means that the (normalized) features should be uniformly distributed on the unit hyper-sphere, independently of the meta-data. Here, we propose to define conditional uniformity, relying on the meta-data, that repel only samples with dissimilar meta-data. We show that direct optimization of both conditional alignment and uniformity improves the representations, in terms of linear evaluation, on both CIFAR-100 and a brain MRI dataset.

CVJul 6, 2021
Automatic size and pose homogenization with spatial transformer network to improve and accelerate pediatric segmentation

Giammarco La Barbera, Pietro Gori, Haithem Boussaid et al.

Due to a high heterogeneity in pose and size and to a limited number of available data, segmentation of pediatric images is challenging for deep learning methods. In this work, we propose a new CNN architecture that is pose and scale invariant thanks to the use of Spatial Transformer Network (STN). Our architecture is composed of three sequential modules that are estimated together during training: (i) a regression module to estimate a similarity matrix to normalize the input image to a reference one; (ii) a differentiable module to find the region of interest to segment; (iii) a segmentation module, based on the popular UNet architecture, to delineate the object. Unlike the original UNet, which strives to learn a complex mapping, including pose and scale variations, from a finite training dataset, our segmentation module learns a simpler mapping focusing on images with normalized pose and size. Furthermore, the use of an automatic bounding box detection through STN allows saving time and especially memory, while keeping similar performance. We test the proposed method in kidney and renal tumor segmentation on abdominal pediatric CT scanners. Results indicate that the estimated STN homogenization of size and pose accelerates the segmentation (25h), compared to standard data-augmentation (33h), while obtaining a similar quality for the kidney (88.01\% of Dice score) and improving the renal tumor delineation (from 85.52\% to 87.12\%).

CVJul 5, 2021
Fast and Scalable Optimal Transport for Brain Tractograms

Jean Feydy, Pierre Roussillon, Alain Trouvé et al.

We present a new multiscale algorithm for solving regularized Optimal Transport problems on the GPU, with a linear memory footprint. Relying on Sinkhorn divergences which are convex, smooth and positive definite loss functions, this method enables the computation of transport plans between millions of points in a matter of minutes. We show the effectiveness of this approach on brain tractograms modeled either as bundles of fibers or as track density maps. We use the resulting smooth assignments to perform label transfer for atlas-based segmentation of fiber tractograms. The parameters -- blur and reach -- of our method are meaningful, defining the minimum and maximum distance at which two fibers are compared with each other. They can be set according to anatomical knowledge. Furthermore, we also propose to estimate a probabilistic atlas of a population of track density maps as a Wasserstein barycenter. Our CUDA implementation is endowed with a user-friendly PyTorch interface, freely available on the PyPi repository (pip install geomloss) and at www.kernel-operations.io/geomloss.

MLJul 5, 2021
Template-Based Graph Clustering

Mateus Riva, Florian Yger, Pietro Gori et al.

We propose a novel graph clustering method guided by additional information on the underlying structure of the clusters (or communities). The problem is formulated as the matching of a graph to a template with smaller dimension, hence matching $n$ vertices of the observed graph (to be clustered) to the $k$ vertices of a template graph, using its edges as support information, and relaxed on the set of orthonormal matrices in order to find a $k$ dimensional embedding. With relevant priors that encode the density of the clusters and their relationships, our method outperforms classical methods, especially for challenging cases.

CVJun 17, 2021
Knowledge distillation from multi-modal to mono-modal segmentation networks

Minhao Hu, Matthis Maillard, Ya Zhang et al.

The joint use of multiple imaging modalities for medical image segmentation has been widely studied in recent years. The fusion of information from different modalities has demonstrated to improve the segmentation accuracy, with respect to mono-modal segmentations, in several applications. However, acquiring multiple modalities is usually not possible in a clinical setting due to a limited number of physicians and scanners, and to limit costs and scan time. Most of the time, only one modality is acquired. In this paper, we propose KD-Net, a framework to transfer knowledge from a trained multi-modal network (teacher) to a mono-modal one (student). The proposed method is an adaptation of the generalized distillation framework where the student network is trained on a subset (1 modality) of the teacher's inputs (n modalities). We illustrate the effectiveness of the proposed framework in brain tumor segmentation with the BraTS 2018 dataset. Using different architectures, we show that the student network effectively learns from the teacher and always outperforms the baseline mono-modal network in terms of segmentation accuracy.

CVJun 16, 2021
Contrastive Learning with Continuous Proxy Meta-Data for 3D MRI Classification

Benoit Dufumier, Pietro Gori, Julie Victor et al.

Traditional supervised learning with deep neural networks requires a tremendous amount of labelled data to converge to a good solution. For 3D medical images, it is often impractical to build a large homogeneous annotated dataset for a specific pathology. Self-supervised methods offer a new way to learn a representation of the images in an unsupervised manner with a neural network. In particular, contrastive learning has shown great promises by (almost) matching the performance of fully-supervised CNN on vision tasks. Nonetheless, this method does not take advantage of available meta-data, such as participant's age, viewed as prior knowledge. Here, we propose to leverage continuous proxy metadata, in the contrastive learning framework, by introducing a new loss called y-Aware InfoNCE loss. Specifically, we improve the positive sampling during pre-training by adding more positive examples with similar proxy meta-data with the anchor, assuming they share similar discriminative semantic features.With our method, a 3D CNN model pre-trained on $10^4$ multi-site healthy brain MRI scans can extract relevant features for three classification tasks: schizophrenia, bipolar diagnosis and Alzheimer's detection. When fine-tuned, it also outperforms 3D CNN trained from scratch on these tasks, as well as state-of-the-art self-supervised methods. Our code is made publicly available here.

CVJun 2, 2021
Benchmarking CNN on 3D Anatomical Brain MRI: Architectures, Data Augmentation and Deep Ensemble Learning

Benoit Dufumier, Pietro Gori, Ilaria Battaglia et al.

Deep Learning (DL) and specifically CNN models have become a de facto method for a wide range of vision tasks, outperforming traditional machine learning (ML) methods. Consequently, they drew a lot of attention in the neuroimaging field in particular for phenotype prediction or computer-aided diagnosis. However, most of the current studies often deal with small single-site cohorts, along with a specific pre-processing pipeline and custom CNN architectures, which make them difficult to compare to. We propose an extensive benchmark of recent state-of-the-art (SOTA) 3D CNN, evaluating also the benefits of data augmentation and deep ensemble learning, on both Voxel-Based Morphometry (VBM) pre-processing and quasi-raw images. Experiments were conducted on a large multi-site 3D brain anatomical MRI data-set comprising N=10k scans on 3 challenging tasks: age prediction, sex classification, and schizophrenia diagnosis. We found that all models provide significantly better predictions with VBM images than quasi-raw data. This finding evolved as the training set approaches 10k samples where quasi-raw data almost reach the performance of VBM. Moreover, we showed that linear models perform comparably with SOTA CNN on VBM data. We also demonstrated that DenseNet and tiny-DenseNet, a lighter version that we proposed, provide a good compromise in terms of performance in all data regime. Therefore, we suggest to employ them as the architectures by default. Critically, we also showed that current CNN are still very biased towards the acquisition site, even when trained with N=10k multi-site images. In this context, VBM pre-processing provides an efficient way to limit this site effect. Surprisingly, we did not find any clear benefit from data augmentation techniques. Finally, we proved that deep ensemble learning is well suited to re-calibrate big CNN models without sacrificing performance.

CVMay 12, 2021
Deep Graphics Encoder for Real-Time Video Makeup Synthesis from Example

Robin Kips, Ruowei Jiang, Sileye Ba et al.

While makeup virtual-try-on is now widespread, parametrizing a computer graphics rendering engine for synthesizing images of a given cosmetics product remains a challenging task. In this paper, we introduce an inverse computer graphics method for automatic makeup synthesis from a reference image, by learning a model that maps an example portrait image with makeup to the space of rendering parameters. This method can be used by artists to automatically create realistic virtual cosmetics image samples, or by consumers, to virtually try-on a makeup extracted from their favorite reference image.

CVFeb 22, 2021
Approximation of dilation-based spatial relations to add structural constraints in neural networks

Mateus Riva, Pietro Gori, Florian Yger et al.

Spatial relations between objects in an image have proved useful for structural object recognition. Structural constraints can act as regularization in neural network training, improving generalization capability with small datasets. Several relations can be modeled as a morphological dilation of a reference object with a structuring element representing the semantics of the relation, from which the degree of satisfaction of the relation between another object and the reference object can be derived. However, dilation is not differentiable, requiring an approximation to be used in the context of gradient-descent training of a network. We propose to approximate dilations using convolutions based on a kernel equal to the structuring element. We show that the proposed approximation, even if slightly less accurate than previous approximations, is definitely faster to compute and therefore more suitable for computationally intensive neural network applications.

CVAug 24, 2020
CA-GAN: Weakly Supervised Color Aware GAN for Controllable Makeup Transfer

Robin Kips, Pietro Gori, Matthieu Perrot et al.

While existing makeup style transfer models perform an image synthesis whose results cannot be explicitly controlled, the ability to modify makeup color continuously is a desirable property for virtual try-on applications. We propose a new formulation for the makeup style transfer task, with the objective to learn a color controllable makeup style synthesis. We introduce CA-GAN, a generative model that learns to modify the color of specific objects (e.g. lips or eyes) in the image to an arbitrary target color while preserving background. Since color labels are rare and costly to acquire, our method leverages weakly supervised learning for conditional GANs. This enables to learn a controllable synthesis of complex objects, and only requires a weak proxy of the image attribute that we desire to modify. Finally, we present for the first time a quantitative analysis of makeup style transfer and color control performance.

CVSep 18, 2017
White Matter Fiber Segmentation Using Functional Varifolds

Kuldeep Kumar, Pietro Gori, Benjamin Charlier et al.

The extraction of fibers from dMRI data typically produces a large number of fibers, it is common to group fibers into bundles. To this end, many specialized distance measures, such as MCP, have been used for fiber similarity. However, these distance based approaches require point-wise correspondence and focus only on the geometry of the fibers. Recent publications have highlighted that using microstructure measures along fibers improves tractography analysis. Also, many neurodegenerative diseases impacting white matter require the study of microstructure measures as well as the white matter geometry. Motivated by these, we propose to use a novel computational model for fibers, called functional varifolds, characterized by a metric that considers both the geometry and microstructure measure (e.g. GFA) along the fiber pathway. We use it to cluster fibers with a dictionary learning and sparse coding-based framework, and present a preliminary analysis using HCP data.

MLAug 4, 2017
Comparison of Distances for Supervised Segmentation of White Matter Tractography

Emanuele Olivetti, Giulia Bertò, Pietro Gori et al.

Tractograms are mathematical representations of the main paths of axons within the white matter of the brain, from diffusion MRI data. Such representations are in the form of polylines, called streamlines, and one streamline approximates the common path of tens of thousands of axons. The analysis of tractograms is a task of interest in multiple fields, like neurosurgery and neurology. A basic building block of many pipelines of analysis is the definition of a distance function between streamlines. Multiple distance functions have been proposed in the literature, and different authors use different distances, usually without a specific reason other than invoking the "common practice". To this end, in this work we want to test such common practices, in order to obtain factual reasons for choosing one distance over another. For these reasons, in this work we compare many streamline distance functions available in the literature. We focus on the common task of automatic bundle segmentation and we adopt the recent approach of supervised segmentation from expert-based examples. Using the HCP dataset, we compare several distances obtaining guidelines on the choice of which distance function one should use for supervised bundle segmentation.