LGOct 29, 2023
Proving Linear Mode Connectivity of Neural Networks via Optimal TransportDamien Ferbach, Baptiste Goujaud, Gauthier Gidel et al.
The energy landscape of high-dimensional non-convex optimization problems is crucial to understanding the effectiveness of modern deep neural network architectures. Recent works have experimentally shown that two different solutions found after two runs of a stochastic training are often connected by very simple continuous paths (e.g., linear) modulo a permutation of the weights. In this paper, we provide a framework theoretically explaining this empirical observation. Based on convergence rates in Wasserstein distance of empirical measures, we show that, with high probability, two wide enough two-layer neural networks trained with stochastic gradient descent are linearly connected. Additionally, we express upper and lower bounds on the width of each layer of two deep neural networks with independent neuron weights to be linearly connected. Finally, we empirically demonstrate the validity of our approach by showing how the dimension of the support of the weight distribution of neurons, which dictates Wasserstein convergence rates is correlated with linear mode connectivity.
LGJul 15, 2025
Gradient Descent on Logistic Regression: Do Large Step-Sizes Work with Data on the Sphere?Si Yi Meng, Baptiste Goujaud, Antonio Orvieto et al.
Gradient descent (GD) on logistic regression has many fascinating properties. When the dataset is linearly separable, it is known that the iterates converge in direction to the maximum-margin separator regardless of how large the step size is. In the non-separable case, however, it has been shown that GD can exhibit a cycling behaviour even when the step sizes is still below the stability threshold $2/λ$, where $λ$ is the largest eigenvalue of the Hessian at the solution. This short paper explores whether restricting the data to have equal magnitude is a sufficient condition for global convergence, under any step size below the stability threshold. We prove that this is true in a one dimensional space, but in higher dimensions cycling behaviour can still occur. We hope to inspire further studies on quantifying how common these cycles are in realistic datasets, as well as finding sufficient conditions to guarantee global convergence with large step sizes.
OCJan 11, 2022
PEPit: computer-assisted worst-case analyses of first-order optimization methods in PythonBaptiste Goujaud, Céline Moucer, François Glineur et al.
PEPit is a Python package aiming at simplifying the access to worst-case analyses of a large family of first-order optimization methods possibly involving gradient, projection, proximal, or linear optimization oracles, along with their approximate, or Bregman variants. In short, PEPit is a package enabling computer-assisted worst-case analyses of first-order optimization methods. The key underlying idea is to cast the problem of performing a worst-case analysis, often referred to as a performance estimation problem (PEP), as a semidefinite program (SDP) which can be solved numerically. To do that, the package users are only required to write first-order methods nearly as they would have implemented them. The package then takes care of the SDP modeling parts, and the worst-case analysis is performed numerically via a standard solver.
LGDec 10, 2020
A Study of Condition Numbers for First-Order OptimizationCharles Guille-Escuret, Baptiste Goujaud, Manuela Girotti et al.
The study of first-order optimization algorithms (FOA) typically starts with assumptions on the objective functions, most commonly smoothness and strong convexity. These metrics are used to tune the hyperparameters of FOA. We introduce a class of perturbations quantified via a new norm, called *-norm. We show that adding a small perturbation to the objective function has an equivalently small impact on the behavior of any FOA, which suggests that it should have a minor impact on the tuning of the algorithm. However, we show that smoothness and strong convexity can be heavily impacted by arbitrarily small perturbations, leading to excessively conservative tunings and convergence issues. In view of these observations, we propose a notion of continuity of the metrics, which is essential for a robust tuning strategy. Since smoothness and strong convexity are not continuous, we propose a comprehensive study of existing alternative metrics which we prove to be continuous. We describe their mutual relations and provide their guaranteed convergence rates for the Gradient Descent algorithm accordingly tuned. Finally we discuss how our work impacts the theoretical understanding of FOA and their performances.
LGMar 20, 2019
Gradient based sample selection for online continual learningRahaf Aljundi, Min Lin, Baptiste Goujaud et al.
A continual learning agent learns online with a non-stationary and never-ending stream of data. The key to such learning process is to overcome the catastrophic forgetting of previously seen data, which is a well known problem of neural networks. To prevent forgetting, a replay buffer is usually employed to store the previous data for the purpose of rehearsal. Previous works often depend on task boundary and i.i.d. assumptions to properly select samples for the replay buffer. In this work, we formulate sample selection as a constraint reduction problem based on the constrained optimization view of continual learning. The goal is to select a fixed subset of constraints that best approximate the feasible region defined by the original constraints. We show that it is equivalent to maximizing the diversity of samples in the replay buffer with parameters gradient as the feature. We further develop a greedy alternative that is cheap and efficient. The advantage of the proposed method is demonstrated by comparing to other alternatives under the continual learning setting. Further comparisons are made against state of the art methods that rely on task boundaries which show comparable or even better results for our method.
APDec 21, 2017
Robust Detection of Covariate-Treatment Interactions in Clinical TrialsBaptiste Goujaud, Eric W. Tramel, Pierre Courtiol et al.
Detection of interactions between treatment effects and patient descriptors in clinical trials is critical for optimizing the drug development process. The increasing volume of data accumulated in clinical trials provides a unique opportunity to discover new biomarkers and further the goal of personalized medicine, but it also requires innovative robust biomarker detection methods capable of detecting non-linear, and sometimes weak, signals. We propose a set of novel univariate statistical tests, based on the theory of random walks, which are able to capture non-linear and non-monotonic covariate-treatment interactions. We also propose a novel combined test, which leverages the power of all of our proposed univariate tests into a single general-case tool. We present results for both synthetic trials as well as real-world clinical trials, where we compare our method with state-of-the-art techniques and demonstrate the utility and robustness of our approach.