LGMay 31, 2022
AdaTask: Adaptive Multitask Online LearningPierre Laforgue, Andrea Della Vecchia, Nicolò Cesa-Bianchi et al.
We introduce and analyze AdaTask, a multitask online learning algorithm that adapts to the unknown structure of the tasks. When the $N$ tasks are stochastically activated, we show that the regret of AdaTask is better, by a factor that can be as large as $\sqrt{N}$, than the regret achieved by running $N$ independent algorithms, one for each task. AdaTask can be seen as a comparator-adaptive version of Follow-the-Regularized-Leader with a Mahalanobis norm potential. Through a variational formulation of this potential, our analysis reveals how AdaTask jointly learns the tasks and their structure. Experiments supporting our findings are presented.
SOC-PHJul 25, 2023
A model for efficient dynamical ranking in networksAndrea Della Vecchia, Kibidi Neocosmos, Daniel B. Larremore et al.
We present a physics-inspired method for inferring dynamic rankings in directed temporal networks - networks in which each directed and timestamped edge reflects the outcome and timing of a pairwise interaction. The inferred ranking of each node is real-valued and varies in time as each new edge, encoding an outcome like a win or loss, raises or lowers the node's estimated strength or prestige, as is often observed in real scenarios including sequences of games, tournaments, or interactions in animal hierarchies. Our method works by solving a linear system of equations and requires only one parameter to be tuned. As a result, the corresponding algorithm is scalable and efficient. We test our method by evaluating its ability to predict interactions (edges' existence) and their outcomes (edges' directions) in a variety of applications, including both synthetic and real data. Our analysis shows that in many cases our method's performance is better than existing methods for predicting dynamic rankings and interaction outcomes.
LGApr 17, 2023
Snacks: a fast large-scale kernel SVM solverSofiane Tanji, Andrea Della Vecchia, François Glineur et al.
Kernel methods provide a powerful framework for non parametric learning. They are based on kernel functions and allow learning in a rich functional space while applying linear statistical learning tools, such as Ridge Regression or Support Vector Machines. However, standard kernel methods suffer from a quadratic time and memory complexity in the number of data points and thus have limited applications in large-scale learning. In this paper, we propose Snacks, a new large-scale solver for Kernel Support Vector Machines. Specifically, Snacks relies on a Nyström approximation of the kernel matrix and an accelerated variant of the stochastic subgradient method. We demonstrate formally through a detailed empirical evaluation, that it competes with other SVM solvers on a variety of benchmark datasets.
MLSep 24, 2025
Error Propagation in Dynamic Programming: From Stochastic Control to Option PricingAndrea Della Vecchia, Damir Filipović
This paper investigates theoretical and methodological foundations for stochastic optimal control (SOC) in discrete time. We start formulating the control problem in a general dynamic programming framework, introducing the mathematical structure needed for a detailed convergence analysis. The associate value function is estimated through a sequence of approximations combining nonparametric regression methods and Monte Carlo subsampling. The regression step is performed within reproducing kernel Hilbert spaces (RKHSs), exploiting the classical KRR algorithm, while Monte Carlo sampling methods are introduced to estimate the continuation value. To assess the accuracy of our value function estimator, we propose a natural error decomposition and rigorously control the resulting error terms at each time step. We then analyze how this error propagates backward in time-from maturity to the initial stage-a relatively underexplored aspect of the SOC literature. Finally, we illustrate how our analysis naturally applies to a key financial application: the pricing of American options.
MLMay 20, 2025
Computational Efficiency under Covariate Shift in Kernel Ridge RegressionAndrea Della Vecchia, Arnaud Mavakala Watusadisi, Ernesto De Vito et al.
This paper addresses the covariate shift problem in the context of nonparametric regression within reproducing kernel Hilbert spaces (RKHSs). Covariate shift arises in supervised learning when the input distributions of the training and test data differ, presenting additional challenges for learning. Although kernel methods have optimal statistical properties, their high computational demands in terms of time and, particularly, memory, limit their scalability to large datasets. To address this limitation, the main focus of this paper is to explore the trade-off between computational efficiency and statistical accuracy under covariate shift. We investigate the use of random projections where the hypothesis space consists of a random subspace within a given RKHS. Our results show that, even in the presence of covariate shift, significant computational savings can be achieved without compromising learning performance.
MLDec 4, 2022
Regularized ERM on random subspacesAndrea Della Vecchia, Ernesto De Vito, Lorenzo Rosasco
We study a natural extension of classical empirical risk minimization, where the hypothesis space is a random subspace of a given space. In particular, we consider possibly data dependent subspaces spanned by a random subset of the data, recovering as a special case Nystrom approaches for kernel methods. Considering random subspaces naturally leads to computational savings, but the question is whether the corresponding learning accuracy is degraded. These statistical-computational tradeoffs have been recently explored for the least squares loss and self-concordant loss functions, such as the logistic loss. Here, we work to extend these results to convex Lipschitz loss functions, that might not be smooth, such as the hinge loss used in support vector machines. This unified analysis requires developing new proofs, that use different technical tools, such as sub-gaussian inputs, to achieve fast rates. Our main results show the existence of different settings, depending on how hard the learning problem is, for which computational efficiency can be improved with no loss in performance.
MLJun 17, 2020
The Nyström method for convex loss functionsAndrea Della Vecchia, Ernesto De Vito, Jaouad Mourtada et al.
We investigate an extension of classical empirical risk minimization, where the hypothesis space consists of a random subspace within a given Hilbert space. Specifically, we examine the Nyström method where the subspaces are defined by a random subset of the data. This approach recovers Nyström approximations used in kernel methods as a specific case. Using random subspaces naturally leads to computational advantages, but a key question is whether it compromises the learning accuracy. Recently, the tradeoffs between statistics and computation have been explored for the square loss and self-concordant losses, such as the logistic loss. In this paper, we extend these analyses to general convex Lipschitz losses, which may lack smoothness, such as the hinge loss used in support vector machines. Our main results show the existence of various scenarios where computational gains can be achieved without sacrificing learning performance. When specialized to smooth loss functions, our analysis recovers most previous results. Moreover, it allows to consider classification problems and translate the surrogate risk bounds into classification error bounds. Indeed, this gives the opportunity to compare the effect of Nyström approximations when combined with different loss functions such as the hinge or the square loss.