Jérémie Bigot

ML
h-index37
12papers
42citations
Novelty50%
AI Score44

12 Papers

ITDec 1, 2017
Estimation of linear operators from scattered impulse responses

Jérémie Bigot, Paul Escande, Pierre Weiss

We provide a new estimator of integral operators with smooth kernels, obtained from a set of scattered and noisy impulse responses. The proposed approach relies on the formalism of smoothing in reproducing kernel Hilbert spaces and on the choice of an appropriate regularization term that takes the smoothness of the operator into account. It is numerically tractable in very large dimensions. We study the estimator's robustness to noise and analyze its approximation properties with respect to the size and the geometry of the dataset. In addition, we show minimax optimality of the proposed estimator.

MLJul 24, 2024
Low dimensional representation of multi-patient flow cytometry datasets using optimal transport for minimal residual disease detection in leukemia

Erell Gachon, Jérémie Bigot, Elsa Cazelles et al.

Representing and quantifying Minimal Residual Disease (MRD) in Acute Myeloid Leukemia (AML), a type of cancer that affects the blood and bone marrow, is essential in the prognosis and follow-up of AML patients. As traditional cytological analysis cannot detect leukemia cells below 5\%, the analysis of flow cytometry dataset is expected to provide more reliable results. In this paper, we explore statistical learning methods based on optimal transport (OT) to achieve a relevant low-dimensional representation of multi-patient flow cytometry measurements (FCM) datasets considered as high-dimensional probability distributions. Using the framework of OT, we justify the use of the K-means algorithm for dimensionality reduction of multiple large-scale point clouds through mean measure quantization by merging all the data into a single point cloud. After this quantization step, the visualization of the intra and inter-patients FCM variability is carried out by embedding low-dimensional quantized probability measures into a linear space using either Wasserstein Principal Component Analysis (PCA) through linearized OT or log-ratio PCA of compositional data. Using a publicly available FCM dataset and a FCM dataset from Bordeaux University Hospital, we demonstrate the benefits of our approach over the popular kernel mean embedding technique for statistical learning from multiple high-dimensional probability distributions. We also highlight the usefulness of our methodology for low-dimensional projection and clustering patient measurements according to their level of MRD in AML from FCM. In particular, our OT-based approach allows a relevant and informative two-dimensional representation of the results of the FlowSom algorithm, a state-of-the-art method for the detection of MRD in AML using multi-patient FCM.

MLFeb 2
PCA of probability measures: Sparse and Dense sampling regimes

Gachon Erell, Jérémie Bigot, Elsa Cazelles

A common approach to perform PCA on probability measures is to embed them into a Hilbert space where standard functional PCA techniques apply. While convergence rates for estimating the embedding of a single measure from $m$ samples are well understood, the literature has not addressed the setting involving multiple measures. In this paper, we study PCA in a double asymptotic regime where $n$ probability measures are observed, each through $m$ samples. We derive convergence rates of the form $n^{-1/2} + m^{-α}$ for the empirical covariance operator and the PCA excess risk, where $α>0$ depends on the chosen embedding. This characterizes the relationship between the number $n$ of measures and the number $m$ of samples per measure, revealing a sparse (small $m$) to dense (large $m$) transition in the convergence behavior. Moreover, we prove that the dense-regime rate is minimax optimal for the empirical covariance error. Our numerical experiments validate these theoretical rates and demonstrate that appropriate subsampling preserves PCA accuracy while reducing computational cost.

MLJan 19
Wasserstein multivariate auto-regressive models for modeling distributional time series

Yiye Jiang, Jérémie Bigot

This paper is focused on the statistical analysis of data consisting of a collection of multiple series of probability measures that are indexed by distinct time instants and supported over a bounded interval of the real line. By modeling these time-dependent probability measures as random objects in the Wasserstein space, we propose a new auto-regressive model for the statistical analysis of multivariate distributional time series. Using the theory of iterated random function systems, results on the second order stationarity of the solution of such a model are provided. We also propose a consistent estimator for the auto-regressive coefficients of this model. Due to the simplex constraints that we impose on the model coefficients, the proposed estimator that is learned under these constraints, naturally has a sparse structure. The sparsity allows the application of the proposed model in learning a graph of temporal dependency from multivariate distributional time series. We explore the numerical performances of our estimation procedure using simulated data. To shed some light on the benefits of our approach for real data analysis, we also apply this methodology to two data sets, respectively made of observations from age distribution in different countries and those from the bike sharing network in Paris.

MLJul 12, 2022
Wasserstein multivariate auto-regressive models for modeling distributional time series

Yiye Jiang, Jérémie Bigot

This paper is focused on the statistical analysis of data consisting of a collection of multiple series of probability measures that are indexed by distinct time instants and supported over a bounded interval of the real line. By modeling these time-dependent probability measures as random objects in the Wasserstein space, we propose a new auto-regressive model for the statistical analysis of multivariate distributional time series. Using the theory of iterated random function systems, results on the second order stationarity of the solution of such a model are provided. We also propose a consistent estimator for the auto-regressive coefficients of this model. Due to the simplex constraints that we impose on the model coefficients, the proposed estimator that is learned under these constraints, naturally has a sparse structure. The sparsity allows the application of the proposed model in learning a graph of temporal dependency from multivariate distributional time series. We explore the numerical performances of our estimation procedure using simulated data. To shed some light on the benefits of our approach for real data analysis, we also apply this methodology to two data sets, respectively made of observations from age distribution in different countries and those from the bike sharing network in Paris.

LGSep 18, 2025
Stochastic Adaptive Gradient Descent Without Descent

Jean-François Aujol, Jérémie Bigot, Camille Castera

We introduce a new adaptive step-size strategy for convex optimization with stochastic gradient that exploits the local geometry of the objective function only by means of a first-order stochastic oracle and without any hyper-parameter tuning. The method comes from a theoretically-grounded adaptation of the Adaptive Gradient Descent Without Descent method to the stochastic setting. We prove the convergence of stochastic gradient descent with our step-size under various assumptions, and we show that it empirically competes against tuned baselines.

MLFeb 7, 2025
Scalable and consistent embedding of probability measures into Hilbert spaces via measure quantization

Erell Gachon, Elsa Cazelles, Jérémie Bigot

This paper is focused on statistical learning from data that come as probability measures. In this setting, popular approaches consist in embedding such data into a Hilbert space with either Linearized Optimal Transport or Kernel Mean Embedding. However, the cost of computing such embeddings prohibits their direct use in large-scale settings. We study two methods based on measure quantization for approximating input probability measures with discrete measures of small-support size. The first one is based on optimal quantization of each input measure, while the second one relies on mean-measure quantization. We study the consistency of such approximations, and its implication for scalable embeddings of probability measures into a Hilbert space at a low computational cost. We finally illustrate our findings with various numerical experiments.

MLApr 3, 2025
High-dimensional ridge regression with random features for non-identically distributed data with a variance profile

Issa-Mbenard Dabo, Jérémie Bigot

The behavior of the random feature model in the high-dimensional regression framework has become a popular issue of interest in the machine learning literature}. This model is generally considered for feature vectors $x_i = Σ^{1/2} x_i'$, where $x_i'$ is a random vector made of independent and identically distributed (iid) entries, and $Σ$ is a positive definite matrix representing the covariance of the features. In this paper, we move beyond {\CB this standard assumption by studying the performances of the random features model in the setting of non-iid feature vectors}. Our approach is related to the analysis of the spectrum of large random matrices through random matrix theory (RMT) {\CB and free probability} results. We turn to the analysis of non-iid data by using the notion of variance profile {\CB which} is {\CB well studied in RMT.} Our main contribution is then the study of the limits of the training and {\CB prediction} risks associated to the ridge estimator in the random features model when its dimensions grow. We provide asymptotic equivalents of these risks that capture the behavior of ridge regression with random features in a {\CB high-dimensional} framework. These asymptotic equivalents, {\CB which prove to be sharp in numerical experiments}, are retrieved by adapting, to our setting, established results from operator-valued free probability theory. Moreover, {\CB for various classes of random feature vectors that have not been considered so far in the literature}, our approach allows to show the appearance of the double descent phenomenon when the ridge regularization parameter is small enough.

MLJul 16, 2021
Online Graph Topology Learning from Matrix-valued Time Series

Yiye Jiang, Jérémie Bigot, Sofian Maabout

The focus is on the statistical analysis of matrix-valued time series, where data is collected over a network of sensors, typically at spatial locations, over time. Each sensor records a vector of features at each time point, creating a vectorial time series for each sensor. The goal is to identify the dependency structure among these sensors and represent it with a graph. When only one feature per sensor is observed, vector auto-regressive (VAR) models are commonly used to infer Granger causality, resulting in a causal graph. The first contribution extends VAR models to matrix-variate models for the purpose of graph learning. Additionally, two online procedures are proposed for both low and high dimensions, enabling rapid updates of coefficient estimates as new samples arrive. In the high-dimensional setting, a novel Lasso-type approach is introduced, and homotopy algorithms are developed for online learning. An adaptive tuning procedure for the regularization parameter is also provided. Given that the application of auto-regressive models to data typically requires detrending, which is not feasible in an online context, the proposed AR models are augmented by incorporating trend as an additional parameter, with a particular focus on periodic trends. The online algorithms are adapted to these augmented data models, allowing for simultaneous learning of the graph and trend from streaming samples. Numerical experiments using both synthetic and real data demonstrate the effectiveness of the proposed methods.

MLApr 24, 2020
Sensor selection on graphs via data-driven node sub-sampling in network time series

Yiye Jiang, Jérémie Bigot, Sofian Maabout

This paper is concerned by the problem of selecting an optimal sampling set of sensors over a network of time series for the purpose of signal recovery at non-observed sensors with a minimal reconstruction error. The problem is motivated by applications where time-dependent graph signals are collected over redundant networks. In this setting, one may wish to only use a subset of sensors to predict data streams over the whole collection of nodes in the underlying graph. A typical application is the possibility to reduce the power consumption in a network of sensors that may have limited battery supplies. We propose and compare various data-driven strategies to turn off a fixed number of sensors or equivalently to select a sampling set of nodes. We also relate our approach to the existing literature on sensor selection from multivariate data with a (possibly) underlying graph structure. Our methodology combines tools from multivariate time series analysis, graph signal processing, statistical learning in high-dimension and deep learning. To illustrate the performances of our approach, we report numerical experiments on the analysis of real data from bike sharing networks in different cities.

MLJun 4, 2019
Fréchet random forests for metric space valued regression with non euclidean predictors

Louis Capitaine, Jérémie Bigot, Rodolphe Thiébaut et al.

Random forests are a statistical learning method widely used in many areas of scientific research because of its ability to learn complex relationships between input and output variables and also its capacity to handle high-dimensional data. However, current random forest approaches are not flexible enough to handle heterogeneous data such as curves, images and shapes. In this paper, we introduce Fréchet trees and Fréchet random forests, which allow to handle data for which input and output variables take values in general metric spaces. To this end, a new way of splitting the nodes of trees is introduced and the prediction procedures of trees and forests are generalized. Then, random forests out-of-bag error and variable importance score are naturally adapted. A consistency theorem for Fréchet regressogram predictor using data-driven partitions is given and applied to Fréchet purely uniformly random trees. The method is studied through several simulation scenarios on heterogeneous data combining longitudinal, image and scalar data. Finally, one real dataset about air quality is used to illustrate the use of the proposed method in practice.

SOC-PHJul 25, 2018
Data-Oriented Algorithm for Real-Time Estimation of Flow Rates and Flow Directions in a Water Distribution Network

Christophe Dumora, David Auber, Jérémie Bigot et al.

The aim of this paper is to present how data collected from a water distribution network (WDN) can be used to reconstruct flow rate and flow direction all over the network to enhance knowledge and detection of unforeseen events. The methodological approach consists in modeling the WDN and all available sensor data related to the management of such a network in the form of a flow network graph G = (V, E, s, t, c), with V a set of nodes, E a set of edges whose elements are ordered pairs of distinct nodes, s a source node, t a sink node and c a capacity function on edges. Our objective is to reconstruct a real-valued function f(u,v): VxV => R on all the edges E in VxV from partial observations on a small number of nodes V = {1, ..., n}. This reconstruction method consists in a data-driven Ford-Fulkerson maximum-flow problem in a multi-source, multi-sink context using a constrained bidirectional breadth-first search based on Edmonds-Karp method. The innovative approach is its application in the context of smart cities to operate from sensor data, structural data from a geographical information system (GIS) and consumption estimates.