LGMay 25, 2022
Learning from time-dependent streaming data with online stochastic algorithmsAntoine Godichon-Baggioni, Nicklas Werge, Olivier Wintenberger
This paper addresses stochastic optimization in a streaming setting with time-dependent and biased gradient estimates. We analyze several first-order methods, including Stochastic Gradient Descent (SGD), mini-batch SGD, and time-varying mini-batch SGD, along with their Polyak-Ruppert averages. Our non-asymptotic analysis establishes novel heuristics that link dependence, biases, and convexity levels, enabling accelerated convergence. Specifically, our findings demonstrate that (i) time-varying mini-batch SGD methods have the capability to break long- and short-range dependence structures, (ii) biased SGD methods can achieve comparable performance to their unbiased counterparts, and (iii) incorporating Polyak-Ruppert averaging can accelerate the convergence of the stochastic optimization algorithms. To validate our theoretical findings, we conduct a series of experiments using both simulated and real-life time-dependent data.
LGJan 18, 2023
Online (Non-)Convex Learning via Tempered OptimismMaxime Haddouche, Olivier Wintenberger, Benjamin Guedj
Optimistic Online Learning aims to exploit experts conveying reliable information to predict the future. However, such implicit optimism may be challenged when it comes to practical crafting of such experts. A fundamental example consists in approximating a minimiser of the current problem and use it as expert. In the context of dynamic environments, such an expert only conveys partially relevant information as it may lead to overfitting. To tackle this issue, we introduce in this work the \emph{optimistically tempered} (OT) online learning framework designed to handle such imperfect experts. As a first contribution, we show that tempered optimism is a fruitful paradigm for Online Non-Convex Learning by proposing simple, yet powerful modification of Online Gradient and Mirror Descent. Second, we derive a second OT algorithm for convex losses and third, evaluate the practical efficiency of tempered optimism on real-life datasets and a toy experiment.
MLFeb 4
Geometry-Aware Optimal Transport: Fast Intrinsic Dimension and Wasserstein Distance EstimationFerdinand Genans, Olivier Wintenberger
Solving large scale Optimal Transport (OT) in machine learning typically relies on sampling measures to obtain a tractable discrete problem. While the discrete solver's accuracy is controllable, the rate of convergence of the discretization error is governed by the intrinsic dimension of our data. Therefore, the true bottleneck is the knowledge and control of the sampling error. In this work, we tackle this issue by introducing novel estimators for both sampling error and intrinsic dimension. The key finding is a simple, tuning-free estimator of $\text{OT}_c(ρ, \hatρ)$ that utilizes the semi-dual OT functional and, remarkably, requires no OT solver. Furthermore, we derive a fast intrinsic dimension estimator from the multi-scale decay of our sampling error estimator. This framework unlocks significant computational and statistical advantages in practice, enabling us to (i) quantify the convergence rate of the discretization error, (ii) calibrate the entropic regularization of Sinkhorn divergences to the data's intrinsic geometry, and (iii) introduce a novel, intrinsic-dimension-based Richardson extrapolation estimator that strongly debiases Wasserstein distance estimation. Numerical experiments demonstrate that our geometry-aware pipeline effectively mitigates the discretization error bottleneck while maintaining computational efficiency.
MLJun 10, 2025
Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced ClassificationMoria Mayala, Erwan Scornet, Charles Tillier et al.
Many classification tasks involve imbalanced data, in which a class is largely underrepresented. Several techniques consists in creating a rebalanced dataset on which a classifier is trained. In this paper, we study theoretically such a procedure, when the classifier is a Centered Random Forests (CRF). We establish a Central Limit Theorem (CLT) on the infinite CRF with explicit rates and exact constant. We then prove that the CRF trained on the rebalanced dataset exhibits a bias, which can be removed with appropriate techniques. Based on an importance sampling (IS) approach, the resulting debiased estimator, called IS-ICRF, satisfies a CLT centered at the prediction function value. For high imbalance settings, we prove that the IS-ICRF estimator enjoys a variance reduction compared to the ICRF trained on the original data. Therefore, our theoretical analysis highlights the benefits of training random forests on a rebalanced dataset (followed by a debiasing procedure) compared to using the original data. Our theoretical results, especially the variance rates and the variance reduction, appear to be valid for Breiman's random forests in our experiments.
LGFeb 7, 2024
Online Learning Approach for Survival AnalysisCamila Fernandez, Pierre Gaillard, Joseph de Vilmarest et al.
We introduce an online mathematical framework for survival analysis, allowing real time adaptation to dynamic environments and censored data. This framework enables the estimation of event time distributions through an optimal second order online convex optimization algorithm-Online Newton Step (ONS). This approach, previously unexplored, presents substantial advantages, including explicit algorithms with non-asymptotic convergence guarantees. Moreover, we analyze the selection of ONS hyperparameters, which depends on the exp-concavity property and has a significant influence on the regret bound. We propose a stochastic approach that guarantees logarithmic stochastic regret for ONS. Additionally, we introduce an adaptive aggregation method that ensures robustness in hyperparameter selection while maintaining fast regret bounds. The findings of this paper can extend beyond the survival analysis field, and are relevant for any case characterized by poor exp-concavity and unstable ONS. Finally, these assertions are illustrated by simulation experiments.
LGSep 15, 2021
Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Streaming DataAntoine Godichon-Baggioni, Nicklas Werge, Olivier Wintenberger
We introduce a streaming framework for analyzing stochastic approximation/optimization problems. This streaming framework is analogous to solving optimization problems using time-varying mini-batches that arrive sequentially. We provide non-asymptotic convergence rates of various gradient-based algorithms; this includes the famous Stochastic Gradient (SG) descent (a.k.a. Robbins-Monro algorithm), mini-batch SG and time-varying mini-batch SG algorithms, as well as their iterated averages (a.k.a. Polyak-Ruppert averaging). We show i) how to accelerate convergence by choosing the learning rate according to the time-varying mini-batches, ii) that Polyak-Ruppert averaging achieves optimal convergence in terms of attaining the Cramer-Rao lower bound, and iii) how time-varying mini-batches together with Polyak-Ruppert averaging can provide variance reduction and accelerate convergence simultaneously, which is advantageous for many learning problems, such as online, sequential, and large-scale learning. We further demonstrate these favorable effects for various time-varying mini-batches.
LGApr 16, 2021
Viking: Variational Bayesian Variance TrackingJoseph de Vilmarest, Olivier Wintenberger
We consider the problem of time series forecasting in an adaptive setting. We focus on the inference of state-space models under unknown and potentially time-varying noise variances. We introduce an augmented model in which the variances are represented as auxiliary gaussian latent variables in a tracking mode. As variances are nonnegative, a transformation is chosen and applied to these latent variables. The inference relies on the online variational Bayesian methodology, which consists in minimizing a Kullback-Leibler divergence at each time step. We observe that the minimum of the Kullback-Leibler divergence is an extension of the Kalman filter taking into account the variance uncertainty. We design a novel algorithm, named Viking, using these optimal recursive updates. For auxiliary latent variables, we use second-order bounds whose optimum admit closed-form solutions. Experiments on synthetic data show that Viking behaves well and is robust to misspecification.
LGFeb 1, 2021
Stochastic Online Convex Optimization. Application to probabilistic time series forecastingOlivier Wintenberger
We introduce a general framework of stochastic online convex optimization to obtain fast-rate stochastic regret bounds. We prove that algorithms such as online newton steps and a scale-free 10 version of Bernstein online aggregation achieve best-known rates in unbounded stochastic settings. We apply our approach to calibrate parametric probabilistic forecasters of non-stationary sub-gaussian time series. Our fast-rate stochastic regret bounds are any-time valid. Our proofs combine self-bounded and Poissonnian inequalities for martingales and sub-gaussian random variables, respectively, under a stochastic exp-concavity assumption.
STJun 3, 2020
AdaVol: An Adaptive Recursive Volatility Prediction MethodNicklas Werge, Olivier Wintenberger
Quasi-Maximum Likelihood (QML) procedures are theoretically appealing and widely used for statistical inference. While there are extensive references on QML estimation in batch settings, it has attracted little attention in streaming settings until recently. An investigation of the convergence properties of the QML procedure in a general conditionally heteroscedastic time series model is conducted, and the classical batch optimization routines extended to the framework of streaming and large-scale problems. An adaptive recursive estimation routine for GARCH models named AdaVol is presented. The AdaVol procedure relies on stochastic approximations combined with the technique of Variance Targeting Estimation (VTE). This recursive method has computationally efficient properties, while VTE alleviates some convergence difficulties encountered by the usual QML estimation due to a lack of convexity. Empirical results demonstrate a favorable trade-off between AdaVol's stability and the ability to adapt to time-varying estimates for real-life data.
LGFeb 26, 2020
Kalman Recursions Aggregated OnlineEric Adjakossa, Yannig Goude, Olivier Wintenberger
In this article, we aim at improving the prediction of expert aggregation by using the underlying properties of the models that provide expert predictions. We restrict ourselves to the case where expert predictions come from Kalman recursions, fitting state-space models. By using exponential weights, we construct different algorithms of Kalman recursions Aggregated Online (KAO) that compete with the best expert or the best convex combination of experts in a more or less adaptive way. We improve the existing results on expert aggregation literature when the experts are Kalman recursions by taking advantage of the second-order properties of the Kalman recursions. We apply our approach to Kalman recursions and extend it to the general adversarial expert setting by state-space modeling the errors of the experts. We apply these new algorithms to a real dataset of electricity consumption and show how it can improve forecast performances comparing to other exponentially weighted average procedures.
LGFeb 10, 2020
Stochastic Online Optimization using Kalman RecursionJoseph de Vilmarest, Olivier Wintenberger
We study the Extended Kalman Filter in constant dynamics, offering a bayesian perspective of stochastic optimization. We obtain high probability bounds on the cumulative excess risk in an unconstrained setting. In order to avoid any projection step we propose a two-phase analysis. First, for linear and logistic regressions, we prove that the algorithm enters a local phase where the estimate stays in a small region around the optimum. We provide explicit bounds with high probability on this convergence time. Second, for generalized linear regressions, we provide a martingale analysis of the excess risk in the local phase, improving existing ones in bounded stochastic optimization. The EKF appears as a parameter-free online algorithm with O(d^2) cost per iteration that optimally solves some unconstrained optimization problems.
STJul 4, 2019
Consistent Regression using Data-Dependent CoveringsVincent Margot, Jean-Patrick Baudry, Frédéric Guilloux et al.
In this paper, we introduce a novel method to generate interpretable regression function estimators. The idea is based on called data-dependent coverings. The aim is to extract from the data a covering of the feature space instead of a partition. The estimator predicts the empirical conditional expectation over the cells of the partitions generated from the coverings. Thus, such estimator has the same form as those issued from data-dependent partitioning algorithms. We give sufficient conditions to ensure the consistency, avoiding the sufficient condition of shrinkage of the cells that appears in the former literature. Doing so, we reduce the number of covering elements. We show that such coverings are interpretable and each element of the covering is tagged as significant or insignificant. The proof of the consistency is based on a control of the error of the empirical estimation of conditional expectations which is interesting on its own.
MLJul 1, 2019
Sparse regular variationMeyer Nicolas, Olivier Wintenberger
Regular variation provides a convenient theoretical framework to study large events. In the multivariate setting, the dependence structure of the positive extremes is characterized by a measure - the spectral measure - defined on the positive orthant of the unit sphere. This measure gathers information on the localization of extreme events and has often a sparse support since severe events do not simultaneously occur in all directions. However, it is defined through weak convergence which does not provide a natural way to capture this sparsity structure.In this paper, we introduce the notion of sparse regular variation which allows to better learn the dependence structure of extreme events. This concept is based on the Euclidean projection onto the simplex for which efficient algorithms are known. We prove that under mild assumptions sparse regular variation and regular variation are two equivalent notions and we establish several results for sparsely regularly varying random vectors. Finally, we illustrate on numerical examples how this new concept allows one to detect extremal directions.
LGFeb 26, 2019
Logarithmic Regret for parameter-free Online Logistic RegressionJoseph De Vilmarest, Olivier Wintenberger
We consider online optimization procedures in the context of logistic regression, focusing on the Extended Kalman Filter (EKF). We introduce a second-order algorithm close to the EKF, named Semi-Online Step (SOS), for which we prove a O(log(n)) regret in the adversarial setting, paving the way to similar results for the EKF. This regret bound on SOS is the first for such parameter-free algorithm in the adversarial logistic regression. We prove for the EKF in constant dynamics a O(log(n)) regret in expectation and in the well-specified logistic regression model.
MLJul 12, 2018
Rule Induction Partitioning EstimatorVincent Margot, Jean-Patrick Baudry, Frederic Guilloux et al.
RIPE is a novel deterministic and easily understandable prediction algorithm developed for continuous and discrete ordered data. It infers a model, from a sample, to predict and to explain a real variable $Y$ given an input variable $X \in \mathcal X$ (features). The algorithm extracts a sparse set of hyperrectangles $\mathbf r \subset \mathcal X$, which can be thought of as rules of the form If-Then. This set is then turned into a partition of the features space $\mathcal X$ of which each cell is explained as a list of rules with satisfied their If conditions. The process of RIPE is illustrated on simulated datasets and its efficiency compared with that of other usual algorithms.
STMay 23, 2018
Efficient online algorithms for fast-rate regret bounds under sparsityPierre Gaillard, Olivier Wintenberger
We consider the online convex optimization problem. In the setting of arbitrary sequences and finite set of parameters, we establish a new fast-rate quantile regret bound. Then we investigate the optimization into the L1-ball by discretizing the parameter space. Our algorithm is projection free and we propose an efficient solution by restarting the algorithm on adaptive discretization grids. In the adversarial setting, we develop an algorithm that achieves several rates of convergence with different dependencies on the sparsity of the objective. In the i.i.d. setting, we establish new risk bounds that are adaptive to the sparsity of the problem and to the regularity of the risk (ranging from a rate 1 / $\sqrt T$ for general convex risk to 1 /T for strongly convex risk). These results generalize previous works on sparse online learning. They are obtained under a weak assumption on the risk (Łojasiewicz's assumption) that allows multiple optima which is crucial when dealing with degenerate situations.
LGAug 19, 2016
A Strongly Quasiconvex PAC-Bayesian BoundNiklas Thiemann, Christian Igel, Olivier Wintenberger et al.
We propose a new PAC-Bayesian bound and a way of constructing a hypothesis space, so that the bound is convex in the posterior distribution and also convex in a trade-off parameter between empirical performance of the posterior distribution and its complexity. The complexity is measured by the Kullback-Leibler divergence to a prior. We derive an alternating procedure for minimizing the bound. We show that the bound can be rewritten as a one-dimensional function of the trade-off parameter and provide sufficient conditions under which the function has a single global minimum. When the conditions are satisfied the alternating minimization is guaranteed to converge to the global minimum of the bound. We provide experimental results demonstrating that rigorous minimization of the bound is competitive with cross-validation in tuning the trade-off between complexity and empirical performance. In all our experiments the trade-off turned to be quasiconvex even when the sufficient conditions were violated.
MLApr 4, 2014
Optimal learning with Bernstein Online AggregationOlivier Wintenberger
We introduce a new recursive aggregation procedure called Bernstein Online Aggregation (BOA). The exponential weights include an accuracy term and a second order term that is a proxy of the quadratic variation as in Hazan and Kale (2010). This second term stabilizes the procedure that is optimal in different senses. We first obtain optimal regret bounds in the deterministic context. Then, an adaptive version is the first exponential weights algorithm that exhibits a second order bound with excess losses that appears first in Gaillard et al. (2014). The second order bounds in the deterministic context are extended to a general stochastic context using the cumulative predictive risk. Such conversion provides the main result of the paper, an inequality of a novel type comparing the procedure with any deterministic aggregation procedure for an integrated criteria. Then we obtain an observable estimate of the excess of risk of the BOA procedure. To assert the optimality, we consider finally the iid case for strongly convex and Lipschitz continuous losses and we prove that the optimal rate of aggregation of Tsybakov (2003) is achieved. The batch version of the BOA procedure is then the first adaptive explicit algorithm that satisfies an optimal oracle inequality with high probability.