Stéphane Canu

CV
20papers
1,307citations
Novelty44%
AI Score27

20 Papers

CVJun 15, 2022
Physically-admissible polarimetric data augmentation for road-scene analysis

Cyprien Ruffino, Rachel Blin, Samia Ainouz et al.

Polarimetric imaging, along with deep learning, has shown improved performances on different tasks including scene analysis. However, its robustness may be questioned because of the small size of the training datasets. Though the issue could be solved by data augmentation, polarization modalities are subject to physical feasibility constraints unaddressed by classical data augmentation techniques. To address this issue, we propose to use CycleGAN, an image translation technique based on deep generative models that solely relies on unpaired data, to transfer large labeled road scene datasets to the polarimetric domain. We design several auxiliary loss terms that, alongside the CycleGAN losses, deal with the physical constraints of polarimetric images. The efficiency of this solution is demonstrated on road scene object detection tasks where generated realistic polarimetric images allow to improve performances on cars and pedestrian detection up to 9%. The resulting constrained CycleGAN is publicly released, allowing anyone to generate their own polarimetric images.

CVNov 29, 2021Code
Similarity Contrastive Estimation for Self-Supervised Soft Contrastive Learning

Julien Denize, Jaonary Rabarisoa, Astrid Orcesi et al.

Contrastive representation learning has proven to be an effective self-supervised learning method. Most successful approaches are based on Noise Contrastive Estimation (NCE) and use different views of an instance as positives that should be contrasted with other instances, called negatives, that are considered as noise. However, several instances in a dataset are drawn from the same distribution and share underlying semantic information. A good data representation should contain relations, or semantic similarity, between the instances. Contrastive learning implicitly learns relations but considering all negatives as noise harms the quality of the learned relations. To circumvent this issue, we propose a novel formulation of contrastive learning using semantic similarity between instances called Similarity Contrastive Estimation (SCE). Our training objective is a soft contrastive learning one. Instead of hard classifying positives and negatives, we estimate from one view of a batch a continuous distribution to push or pull instances based on their semantic similarities. This target similarity distribution is sharpened to eliminate noisy relations. The model predicts for each instance, from another view, the target distribution while contrasting its positive with negatives. Experimental results show that SCE is Top-1 on the ImageNet linear evaluation protocol at 100 pretraining epochs with 72.1% accuracy and is competitive with state-of-the-art algorithms by reaching 75.4% for 200 epochs with multi-crop. We also show that SCE is able to generalize to several tasks. Source code is available here: https://github.com/CEA-LIST/SCE.

CVDec 24, 2021
A formal approach to good practices in Pseudo-Labeling for Unsupervised Domain Adaptive Re-Identification

Fabian Dubourvieux, Romaric Audigier, Angélique Loesch et al.

The use of pseudo-labels prevails in order to tackle Unsupervised Domain Adaptive (UDA) Re-Identification (re-ID) with the best performance. Indeed, this family of approaches has given rise to several UDA re-ID specific frameworks, which are effective. In these works, research directions to improve Pseudo-Labeling UDA re-ID performance are varied and mostly based on intuition and experiments: refining pseudo-labels, reducing the impact of errors in pseudo-labels... It can be hard to deduce from them general good practices, which can be implemented in any Pseudo-Labeling method, to consistently improve its performance. To address this key question, a new theoretical view on Pseudo-Labeling UDA re-ID is proposed. The contributions are threefold: (i) A novel theoretical framework for Pseudo-Labeling UDA re-ID, formalized through a new general learning upper-bound on the UDA re-ID performance. (ii) General good practices for Pseudo-Labeling, directly deduced from the interpretation of the proposed theoretical framework, in order to improve the target re-ID performance. (iii) Extensive experiments on challenging person and vehicle cross-dataset re-ID tasks, showing consistent performance improvements for various state-of-the-art methods and various proposed implementations of good practices.

IVNov 8, 2021
Feature-enhanced Generation and Multi-modality Fusion based Deep Neural Network for Brain Tumor Segmentation with Missing MR Modalities

Tongxue Zhou, Stéphane Canu, Pierre Vera et al.

Using multimodal Magnetic Resonance Imaging (MRI) is necessary for accurate brain tumor segmentation. The main problem is that not all types of MRIs are always available in clinical exams. Based on the fact that there is a strong correlation between MR modalities of the same patient, in this work, we propose a novel brain tumor segmentation network in the case of missing one or more modalities. The proposed network consists of three sub-networks: a feature-enhanced generator, a correlation constraint block and a segmentation network. The feature-enhanced generator utilizes the available modalities to generate 3D feature-enhanced image representing the missing modality. The correlation constraint block can exploit the multi-source correlation between the modalities and also constrain the generator to synthesize a feature-enhanced modality which must have a coherent correlation with the available modalities. The segmentation network is a multi-encoder based U-Net to achieve the final brain tumor segmentation. The proposed method is evaluated on BraTS 2018 dataset. Experimental results demonstrate the effectiveness of the proposed method which achieves the average Dice Score of 82.9, 74.9 and 59.1 on whole tumor, tumor core and enhancing tumor, respectively across all the situations, and outperforms the best method by 3.5%, 17% and 18.2%.

CVNov 2, 2021
A Tri-attention Fusion Guided Multi-modal Segmentation Network

Tongxue Zhou, Su Ruan, Pierre Vera et al.

In the field of multimodal segmentation, the correlation between different modalities can be considered for improving the segmentation results. Considering the correlation between different MR modalities, in this paper, we propose a multi-modality segmentation network guided by a novel tri-attention fusion. Our network includes N model-independent encoding paths with N image sources, a tri-attention fusion block, a dual-attention fusion block, and a decoding path. The model independent encoding paths can capture modality-specific features from the N modalities. Considering that not all the features extracted from the encoders are useful for segmentation, we propose to use dual attention based fusion to re-weight the features along the modality and space paths, which can suppress less informative features and emphasize the useful ones for each modality at different positions. Since there exists a strong correlation between different modalities, based on the dual attention fusion block, we propose a correlation attention module to form the tri-attention fusion block. In the correlation attention module, a correlation description block is first used to learn the correlation between modalities and then a constraint based on the correlation is used to guide the network to learn the latent correlated features which are more relevant for segmentation. Finally, the obtained fused feature representation is projected by the decoder to obtain the segmentation results. Our experiment results tested on BraTS 2018 dataset for brain tumor segmentation demonstrate the effectiveness of our proposed method.

CVOct 15, 2021
Improving Unsupervised Domain Adaptive Re-Identification via Source-Guided Selection of Pseudo-Labeling Hyperparameters

Fabian Dubourvieux, Angélique Loesch, Romaric Audigier et al.

Unsupervised Domain Adaptation (UDA) for re-identification (re-ID) is a challenging task: to avoid a costly annotation of additional data, it aims at transferring knowledge from a domain with annotated data to a domain of interest with only unlabeled data. Pseudo-labeling approaches have proven to be effective for UDA re-ID. However, the effectiveness of these approaches heavily depends on the choice of some hyperparameters (HP) that affect the generation of pseudo-labels by clustering. The lack of annotation in the domain of interest makes this choice non-trivial. Current approaches simply reuse the same empirical value for all adaptation tasks and regardless of the target data representation that changes through pseudo-labeling training phases. As this simplistic choice may limit their performance, we aim at addressing this issue. We propose new theoretical grounds on HP selection for clustering UDA re-ID as well as method of automatic and cyclic HP tuning for pseudo-labeling UDA clustering: HyPASS. HyPASS consists in incorporating two modules in pseudo-labeling methods: (i) HP selection based on a labeled source validation set and (ii) conditional domain alignment of feature discriminativeness to improve HP selection based on source samples. Experiments on commonly used person re-ID and vehicle re-ID datasets show that our proposed HyPASS consistently improves the best state-of-the-art methods in re-ID compared to the commonly used empirical HP setting.

IVApr 13, 2021
Latent Correlation Representation Learning for Brain Tumor Segmentation with Missing MRI Modalities

Tongxue Zhou, Stéphane Canu, Pierre Vera et al.

Magnetic Resonance Imaging (MRI) is a widely used imaging technique to assess brain tumor. Accurately segmenting brain tumor from MR images is the key to clinical diagnostics and treatment planning. In addition, multi-modal MR images can provide complementary information for accurate brain tumor segmentation. However, it's common to miss some imaging modalities in clinical practice. In this paper, we present a novel brain tumor segmentation algorithm with missing modalities. Since it exists a strong correlation between multi-modalities, a correlation model is proposed to specially represent the latent multi-source correlation. Thanks to the obtained correlation representation, the segmentation becomes more robust in the case of missing modality. First, the individual representation produced by each encoder is used to estimate the modality independent parameter. Then, the correlation model transforms all the individual representations to the latent multi-source correlation representations. Finally, the correlation representations across modalities are fused via attention mechanism into a shared representation to emphasize the most important features for segmentation. We evaluate our model on BraTS 2018 and BraTS 2019 dataset, it outperforms the current state-of-the-art methods and produces robust results when one or more modalities are missing.

IVFeb 5, 2021
3D Medical Multi-modal Segmentation Network Guided by Multi-source Correlation Constraint

Tongxue Zhou, Stéphane Canu, Pierre Vera et al.

In the field of multimodal segmentation, the correlation between different modalities can be considered for improving the segmentation results. In this paper, we propose a multi-modality segmentation network with a correlation constraint. Our network includes N model-independent encoding paths with N image sources, a correlation constraint block, a feature fusion block, and a decoding path. The model independent encoding path can capture modality-specific features from the N modalities. Since there exists a strong correlation between different modalities, we first propose a linear correlation block to learn the correlation between modalities, then a loss function is used to guide the network to learn the correlated features based on the linear correlation block. This block forces the network to learn the latent correlated features which are more relevant for segmentation. Considering that not all the features extracted from the encoders are useful for segmentation, we propose to use dual attention based fusion block to recalibrate the features along the modality and spatial paths, which can suppress less informative features and emphasize the useful ones. The fused feature representation is finally projected by the decoder to obtain the segmentation result. Our experiment results tested on BraTS-2018 dataset for brain tumor segmentation demonstrate the effectiveness of our proposed method.

IVApr 22, 2020
A review: Deep learning for medical image segmentation using multi-modality fusion

Tongxue Zhou, Su Ruan, Stéphane Canu

Multi-modality is widely used in medical imaging, because it can provide multiinformation about a target (tumor, organ or tissue). Segmentation using multimodality consists of fusing multi-information to improve the segmentation. Recently, deep learning-based approaches have presented the state-of-the-art performance in image classification, segmentation, object detection and tracking tasks. Due to their self-learning and generalization ability over large amounts of data, deep learning recently has also gained great interest in multi-modal medical image segmentation. In this paper, we give an overview of deep learning-based approaches for multi-modal medical image segmentation task. Firstly, we introduce the general principle of deep learning and multi-modal medical image segmentation. Secondly, we present different deep learning network architectures, then analyze their fusion strategies and compare their results. The earlier fusion is commonly used, since it's simple and it focuses on the subsequent segmentation network architecture. However, the later fusion gives more attention on fusion strategy to learn the complex relationship between different modalities. In general, compared to the earlier fusion, the later fusion can give more accurate result if the fusion method is effective enough. We also discuss some common problems in medical image segmentation. Finally, we summarize and provide some perspectives on the future research.

IVApr 14, 2020
An automatic COVID-19 CT segmentation network using spatial and channel attention mechanism

Tongxue Zhou, Stéphane Canu, Su Ruan

The coronavirus disease (COVID-19) pandemic has led to a devastating effect on the global public health. Computed Tomography (CT) is an effective tool in the screening of COVID-19. It is of great importance to rapidly and accurately segment COVID-19 from CT to help diagnostic and patient monitoring. In this paper, we propose a U-Net based segmentation network using attention mechanism. As not all the features extracted from the encoders are useful for segmentation, we propose to incorporate an attention mechanism including a spatial and a channel attention, to a U-Net architecture to re-weight the feature representation spatially and channel-wise to capture rich contextual relationships for better feature representation. In addition, the focal tversky loss is introduced to deal with small lesion segmentation. The experiment results, evaluated on a COVID-19 CT segmentation dataset where 473 CT slices are available, demonstrate the proposed method can achieve an accurate and rapid segmentation on COVID-19 segmentation. The method takes only 0.29 second to segment a single CT slice. The obtained Dice Score, Sensitivity and Specificity are 83.1%, 86.7% and 99.3%, respectively.

IVMar 19, 2020
Brain tumor segmentation with missing modalities via latent multi-source correlation representation

Tongxue Zhou, Stéphane Canu, Pierre Vera et al.

Multimodal MR images can provide complementary information for accurate brain tumor segmentation. However, it's common to have missing imaging modalities in clinical practice. Since there exists a strong correlation between multi modalities, a novel correlation representation block is proposed to specially discover the latent multi-source correlation. Thanks to the obtained correlation representation, the segmentation becomes more robust in the case of missing modalities. The model parameter estimation module first maps the individual representation produced by each encoder to obtain independent parameters, then, under these parameters, the correlation expression module transforms all the individual representations to form a latent multi-source correlation representation. Finally, the correlation representations across modalities are fused via the attention mechanism into a shared representation to emphasize the most important features for segmentation. We evaluate our model on BraTS 2018 datasets, it outperforms the current state-of-the-art method and produces robust results when one or more modalities are missing.

CLNov 26, 2019
Doc2Vec on the PubMed corpus: study of a new approach to generate related articles

Emeric Dynomant, Stéfan J. Darmoni, Émeline Lejeune et al.

PubMed is the biggest and most used bibliographic database worldwide, hosting more than 26M biomedical publications. One of its useful features is the "similar articles" section, allowing the end-user to find scientific articles linked to the consulted document in term of context. The aim of this study is to analyze whether it is possible to replace the statistic model PubMed Related Articles (pmra) with a document embedding method. Doc2Vec algorithm was used to train models allowing to vectorize documents. Six of its parameters were optimised by following a grid-search strategy to train more than 1,900 models. Parameters combination leading to the best accuracy was used to train models on abstracts from the PubMed database. Four evaluations tasks were defined to determine what does or does not influence the proximity between documents for both Doc2Vec and pmra. The two different Doc2Vec architectures have different abilities to link documents about a common context. The terminological indexing, words and stems contents of linked documents are highly similar between pmra and Doc2Vec PV-DBOW architecture. These algorithms are also more likely to bring closer documents having a similar size. In contrary, the manual evaluation shows much better results for the pmra algorithm. While the pmra algorithm links documents by explicitly using terminological indexing in its formula, Doc2Vec does not need a prior indexing. It can infer relations between documents sharing a similar indexing, without any knowledge about them, particularly regarding the PV-DBOW architecture. In contrary, the human evaluation, without any clear agreement between evaluators, implies future studies to better understand this difference between PV-DBOW and pmra algorithm.

CVOct 2, 2019
Road scenes analysis in adverse weather conditions by polarization-encoded images and adapted deep learning

Rachel Blin, Samia Ainouz, Stéphane Canu et al.

Object detection in road scenes is necessary to develop both autonomous vehicles and driving assistance systems. Even if deep neural networks for recognition task have shown great performances using conventional images, they fail to detect objects in road scenes in complex acquisition situations. In contrast, polarization images, characterizing the light wave, can robustly describe important physical properties of the object even under poor illumination or strong reflections. This paper shows how non-conventional polarimetric imaging modality overcomes the classical methods for object detection especially in adverse weather conditions. The efficiency of the proposed method is mostly due to the high power of the polarimetry to discriminate any object by its reflective properties and on the use of deep neural networks for object detection. Our goal by this work, is to prove that polarimetry brings a real added value compared with RGB images for object detection. Experimental results on our own dataset composed of road scene images taken during adverse weather conditions show that polarimetry together with deep learning can improve the state-of-the-art by about 20% to 50% on different detection tasks.

LGJul 30, 2019
Kernels on fuzzy sets: an overview

Jorge Guevara, Roberto Hirata, Stéphane Canu

This paper introduces the concept of kernels on fuzzy sets as a similarity measure for $[0,1]$-valued functions, a.k.a. \emph{membership functions of fuzzy sets}. We defined the following classes of kernels: the cross product, the intersection, the non-singleton and the distance-based kernels on fuzzy sets. Applicability of those kernels are on machine learning and data science tasks where uncertainty in data has an ontic or epistemistic interpretation.

LGApr 16, 2019
Learning 3D Navigation Protocols on Touch Interfaces with Cooperative Multi-Agent Reinforcement Learning

Quentin Debard, Jilles Steeve Dibangoye, Stéphane Canu et al.

Using touch devices to navigate in virtual 3D environments such as computer assisted design (CAD) models or geographical information systems (GIS) is inherently difficult for humans, as the 3D operations have to be performed by the user on a 2D touch surface. This ill-posed problem is classically solved with a fixed and handcrafted interaction protocol, which must be learned by the user. We propose to automatically learn a new interaction protocol allowing to map a 2D user input to 3D actions in virtual environments using reinforcement learning (RL). A fundamental problem of RL methods is the vast amount of interactions often required, which are difficult to come by when humans are involved. To overcome this limitation, we make use of two collaborative agents. The first agent models the human by learning to perform the 2D finger trajectories. The second agent acts as the interaction protocol, interpreting and translating to 3D operations the 2D finger trajectories from the first agent. We restrict the learned 2D trajectories to be similar to a training set of collected human gestures by first performing state representation learning, prior to reinforcement learning. This state representation learning is addressed by projecting the gestures into a latent space learned by a variational auto encoder (VAE).

LGFeb 19, 2018
Learning to recognize touch gestures: recurrent vs. convolutional features and dynamic sampling

Quentin Debard, Christian Wolf, Stéphane Canu et al.

We propose a fully automatic method for learning gestures on big touch devices in a potentially multi-user context. The goal is to learn general models capable of adapting to different gestures, user styles and hardware variations (e.g. device sizes, sampling frequencies and regularities). Based on deep neural networks, our method features a novel dynamic sampling and temporal normalization component, transforming variable length gestures into fixed length representations while preserving finger/surface contact transitions, that is, the topology of the signal. This sequential representation is then processed with a convolutional model capable, unlike recurrent networks, of learning hierarchical representations with different levels of abstraction. To demonstrate the interest of the proposed method, we introduce a new touch gestures dataset with 6591 gestures performed by 27 people, which is, up to our knowledge, the first of its kind: a publicly available multi-touch gesture dataset for interaction. We also tested our method on a standard dataset of symbolic touch gesture recognition, the MMG dataset, outperforming the state of the art and reporting close to perfect performance.

CVSep 21, 2017
A First Derivative Potts Model for Segmentation and Denoising Using ILP

Ruobing Shen, Gerhard Reinelt, Stéphane Canu

Unsupervised image segmentation and denoising are two fundamental tasks in image processing. Usually, graph based models such as multicut are used for segmentation and variational models are employed for denoising. Our approach addresses both problems at the same time. We propose a novel ILP formulation of the first derivative Potts model with the $\ell_1$ data term, where binary variables are introduced to deal with the $\ell_0$ norm of the regularization term. The ILP is then solved by a standard off-the-shelf MIP solver. Numerical experiments are compared with the multicut problem.

CVSep 12, 2017
Une véritable approche $\ell_0$ pour l'apprentissage de dictionnaire

Yuan Liu, Stéphane Canu, Paul Honeine et al.

Sparse representation learning has recently gained a great success in signal and image processing, thanks to recent advances in dictionary learning. To this end, the $\ell_0$-norm is often used to control the sparsity level. Nevertheless, optimization problems based on the $\ell_0$-norm are non-convex and NP-hard. For these reasons, relaxation techniques have been attracting much attention of researchers, by priorly targeting approximation solutions (e.g. $\ell_1$-norm, pursuit strategies). On the contrary, this paper considers the exact $\ell_0$-norm optimization problem and proves that it can be solved effectively, despite of its complexity. The proposed method reformulates the problem as a Mixed-Integer Quadratic Program (MIQP) and gets the global optimal solution by applying existing optimization software. Because the main difficulty of this approach is its computational time, two techniques are introduced that improve the computational speed. Finally, our method is applied to image denoising which shows its feasibility and relevance compared to the state-of-the-art.

LGOct 28, 2015
Operator-valued Kernels for Learning from Functional Response Data

Hachem Kadri, Emmanuel Duflos, Philippe Preux et al.

In this paper we consider the problems of supervised classification and regression in the case where attributes and labels are functions: a data is represented by a set of functions, and the label is also a function. We focus on the use of reproducing kernel Hilbert space theory to learn from such functional data. Basic concepts and properties of kernel-based learning are extended to include the estimation of function-valued functions. In this setting, the representer theorem is restated, a set of rigorously defined infinite-dimensional operator-valued kernels that can be valuably applied when the data are functions is described, and a learning algorithm for nonlinear functional data analysis is introduced. The methodology is illustrated through speech and audio signal processing experiments.

MLJan 12, 2013
Multiple functional regression with both discrete and continuous covariates

Hachem Kadri, Philippe Preux, Emmanuel Duflos et al.

In this paper we present a nonparametric method for extending functional regression methodology to the situation where more than one functional covariate is used to predict a functional response. Borrowing the idea from Kadri et al. (2010a), the method, which support mixed discrete and continuous explanatory variables, is based on estimating a function-valued function in reproducing kernel Hilbert spaces by virtue of positive operator-valued kernels.