CVApr 28, 2020Code
PODNet: Pooled Outputs Distillation for Small-Tasks Incremental LearningArthur Douillard, Matthieu Cord, Charles Ollion et al.
Lifelong learning has attracted much attention, but existing works still struggle to fight catastrophic forgetting and accumulate knowledge over long stretches of incremental learning. In this work, we propose PODNet, a model inspired by representation learning. By carefully balancing the compromise between remembering the old classes and learning new ones, PODNet fights catastrophic forgetting, even over very long runs of small incremental tasks --a setting so far unexplored by current works. PODNet innovates on existing art with an efficient spatial-based distillation-loss applied throughout the model and a representation comprising multiple proxy vectors for each class. We validate those innovations thoroughly, comparing PODNet with three state-of-the-art models on three datasets: CIFAR100, ImageNet100, and ImageNet1000. Our results showcase a significant advantage of PODNet over existing art, with accuracy gains of 12.10, 6.51, and 2.85 percentage points, respectively. Code is available at https://github.com/arthurdouillard/incremental_learning.pytorch
MLFeb 10, 2022
Diffusion bridges vector quantized Variational AutoEncodersMax Cohen, Guillaume Quispe, Sylvain Le Corff et al.
Vector Quantized-Variational AutoEncoders (VQ-VAE) are generative models based on discrete latent representations of the data, where inputs are mapped to a finite set of learned embeddings.To generate new samples, an autoregressive prior distribution over the discrete states must be trained separately. This prior is generally very complex and leads to slow generation. In this work, we propose a new model to train the prior and the encoder/decoder networks simultaneously. We build a diffusion bridge between a continuous coded vector and a non-informative prior distribution. The latent discrete states are then given as random functions of these continuous vectors. We show that our model is competitive with the autoregressive prior on the mini-Imagenet and CIFAR dataset and is efficient in both optimization and sampling. Our framework also extends the standard VQ-VAE and enables end-to-end training.
AISep 20, 2021
Learning Natural Language Generation from ScratchAlice Martin Donati, Guillaume Quispe, Charles Ollion et al.
This paper introduces TRUncated ReinForcement Learning for Language (TrufLL), an original ap-proach to train conditional language models from scratch by only using reinforcement learning (RL). AsRL methods unsuccessfully scale to large action spaces, we dynamically truncate the vocabulary spaceusing a generic language model. TrufLL thus enables to train a language agent by solely interacting withits environment without any task-specific prior knowledge; it is only guided with a task-agnostic languagemodel. Interestingly, this approach avoids the dependency to labelled datasets and inherently reduces pre-trained policy flaws such as language or exposure biases. We evaluate TrufLL on two visual questiongeneration tasks, for which we report positive results over performance and language metrics, which wethen corroborate with a human evaluation. To our knowledge, it is the first approach that successfullylearns a language generation policy (almost) from scratch.
COMar 17, 2021
NEO: Non Equilibrium Sampling on the Orbit of a Deterministic TransformAchille Thin, Yazid Janati, Sylvain Le Corff et al.
Sampling from a complex distribution $π$ and approximating its intractable normalizing constant Z are challenging problems. In this paper, a novel family of importance samplers (IS) and Markov chain Monte Carlo (MCMC) samplers is derived. Given an invertible map T, these schemes combine (with weights) elements from the forward and backward Orbits through points sampled from a proposal distribution $ρ$. The map T does not leave the target $π$ invariant, hence the name NEO, standing for Non-Equilibrium Orbits. NEO-IS provides unbiased estimators of the normalizing constant and self-normalized IS estimators of expectations under $π$ while NEO-MCMC combines multiple NEO-IS estimates of the normalizing constant and an iterated sampling-importance resampling mechanism to sample from $π$. For T chosen as a discrete-time integrator of a conformal Hamiltonian system, NEO-IS achieves state-of-the art performance on difficult benchmarks and NEO-MCMC is able to explore highly multimodal targets. Additionally, we provide detailed theoretical results for both methods. In particular, we show that NEO-MCMC is uniformly geometrically ergodic and establish explicit mixing time estimates under mild conditions.
LGFeb 16, 2021
Joint self-supervised blind denoising and noise estimationJean Ollion, Charles Ollion, Elisabeth Gassiat et al.
We propose a novel self-supervised image blind denoising approach in which two neural networks jointly predict the clean signal and infer the noise distribution. Assuming that the noisy observations are independent conditionally to the signal, the networks can be jointly trained without clean training data. Therefore, our approach is particularly relevant for biomedical image denoising where the noise is difficult to model precisely and clean training data are usually unavailable. Our method significantly outperforms current state-of-the-art self-supervised blind denoising algorithms, on six publicly available biomedical image datasets. We also show empirically with synthetic noisy data that our model captures the noise distribution efficiently. Finally, the described framework is simple, lightweight and computationally efficient, making it useful in practical cases.
CVOct 6, 2020
CoRe: Color Regression for Multicolor Fashion GarmentsAlexandre Rame, Arthur Douillard, Charles Ollion
Developing deep networks that analyze fashion garments has many real-world applications. Among all fashion attributes, color is one of the most important yet challenging to detect. Existing approaches are classification-based and thus cannot go beyond the list of discrete predefined color names. In this paper, we handle color detection as a regression problem to predict the exact RGB values. That's why in addition to a first color classifier, we include a second regression stage for refinement in our newly proposed architecture. This second step combines two attention models: the first depends on the type of clothing, the second depends on the color previously detected by the classifier. Our final prediction is the weighted spatial pooling over the image pixels RGB values, where the illumination has been corrected. This architecture is modular and easily expanded to detect the RGBs of all colors in a multicolor garment. In our experiments, we show the benefits of each component of our architecture.
LGJul 15, 2020
The Monte Carlo Transformer: a stochastic self-attention model for sequence predictionAlice Martin, Charles Ollion, Florian Strub et al.
This paper introduces the Sequential Monte Carlo Transformer, an original approach that naturally captures the observations distribution in a transformer architecture. The keys, queries, values and attention vectors of the network are considered as the unobserved stochastic states of its hidden structure. This generative model is such that at each time step the received observation is a random function of its past states in a given attention window. In this general state-space setting, we use Sequential Monte Carlo methods to approximate the posterior distributions of the states given the observations, and to estimate the gradient of the log-likelihood. We hence propose a generative model giving a predictive distribution, instead of a single-point estimate.
CVJun 24, 2020
Insights from the Future for Continual LearningArthur Douillard, Eduardo Valle, Charles Ollion et al.
Continual learning aims to learn tasks sequentially, with (often severe) constraints on the storage of old learning samples, without suffering from catastrophic forgetting. In this work, we propose prescient continual learning, a novel experimental setting, to incorporate existing information about the classes, prior to any training data. Usually, each task in a traditional continual learning setting evaluates the model on present and past classes, the latter with a limited number of training samples. Our setting adds future classes, with no training samples at all. We introduce Ghost Model, a representation-learning-based model for continual learning using ideas from zero-shot learning. A generative model of the representation space in concert with a careful adjustment of the losses allows us to exploit insights from future classes to constraint the spatial arrangement of the past and current classes. Quantitative results on the AwA2 and aP\&Y datasets and detailed visualizations showcase the interest of this new setting and the method we propose to address it.
IVMar 17, 2020
DistNet: Deep Tracking by displacement regression: application to bacteria growing in the Mother MachineJean Ollion, Charles Ollion
The mother machine is a popular microfluidic device that allows long-term time-lapse imaging of thousands of cells in parallel by microscopy. It has become a valuable tool for single-cell level quantitative analysis and characterization of many cellular processes such as gene expression and regulation, mutagenesis or response to antibiotics. The automated and quantitative analysis of the massive amount of data generated by such experiments is now the limiting step. In particular the segmentation and tracking of bacteria cells imaged in phase-contrast microscopy---with error rates compatible with high-throughput data---is a challenging problem. In this work, we describe a novel formulation of the multi-object tracking problem, in which tracking is performed by a regression of the bacteria's displacement, allowing simultaneous tracking of multiple bacteria, despite their growth and division over time. Our method performs jointly segmentation and tracking, leveraging sequential information to increase segmentation accuracy. We introduce a Deep Neural Network architecture taking advantage of a self-attention mechanism which yields extremely low tracking error rate and segmentation error rate. We demonstrate superior performance and speed compared to state-of-the-art methods. Our method is named DiSTNet which stands for DISTance+DISplacement Segmentation and Tracking Network. While this method is particularly well suited for mother machine microscopy data, its general joint tracking and segmentation formulation could be applied to many other problems with different geometries.
CVDec 6, 2018
OMNIA Faster R-CNN: Detection in the wild through dataset merging and soft distillationAlexandre Rame, Emilien Garreau, Hedi Ben-Younes et al.
Object detectors tend to perform poorly in new or open domains, and require exhaustive yet costly annotations from fully labeled datasets. We aim at benefiting from several datasets with different categories but without additional labelling, not only to increase the number of categories detected, but also to take advantage from transfer learning and to enhance domain independence. Our dataset merging procedure starts with training several initial Faster R-CNN on the different datasets while considering the complementary datasets' images for domain adaptation. Similarly to self-training methods, the predictions of these initial detectors mitigate the missing annotations on the complementary datasets. The final OMNIA Faster R-CNN is trained with all categories on the union of the datasets enriched by predictions. The joint training handles unsafe targets with a new classification loss called SoftSig in a softly supervised way. Experimental results show that in the case of fashion detection for images in the wild, merging Modanet with COCO increases the final performance from 45.5% to 57.4% in mAP. Applying our soft distillation to the task of detection with domain shift between GTA and Cityscapes enables to beat the state-of-the-art by 5.3 points. Our methodology could unlock object detection for real-world applications without immense datasets.
CVSep 27, 2017
Leveraging Weakly Annotated Data for Fashion Image Retrieval and Label PredictionCharles Corbière, Hedi Ben-Younes, Alexandre Ramé et al.
In this paper, we present a method to learn a visual representation adapted for e-commerce products. Based on weakly supervised learning, our model learns from noisy datasets crawled on e-commerce website catalogs and does not require any manual labeling. We show that our representation can be used for downward classification tasks over clothing categories with different levels of granularity. We also demonstrate that the learnt representation is suitable for image retrieval. We achieve nearly state-of-art results on the DeepFashion In-Shop Clothes Retrieval and Categories Attributes Prediction tasks, without using the provided training set.