Elisa Riccietti

LG
h-index10
10papers
53citations
Novelty47%
AI Score38

10 Papers

CVJul 28, 2022
Self-supervised learning with rotation-invariant kernels

Léon Zheng, Gilles Puy, Elisa Riccietti et al.

We introduce a regularization loss based on kernel mean embeddings with rotation-invariant kernels on the hypersphere (also known as dot-product kernels) for self-supervised learning of image representations. Besides being fully competitive with the state of the art, our method significantly reduces time and memory complexity for self-supervised training, making it implementable for very large embedding dimensions on existing devices and more easily adjustable than previous methods to settings with limited resources. Our work follows the major paradigm where the model learns to be invariant to some predefined image transformations (cropping, blurring, color jittering, etc.), while avoiding a degenerate solution by regularizing the embedding distribution. Our particular contribution is to propose a loss family promoting the embedding distribution to be close to the uniform distribution on the hypersphere, with respect to the maximum mean discrepancy pseudometric. We demonstrate that this family encompasses several regularizers of former methods, including uniformity-based and information-maximization methods, which are variants of our flexible regularization loss with different kernels. Beyond its practical consequences for state-of-the-art self-supervised learning with limited resources, the proposed generic regularization approach opens perspectives to leverage more widely the literature on kernel methods in order to improve self-supervised learning methods.

MLOct 2, 2023
A path-norm toolkit for modern networks: consequences, promises and challenges

Antoine Gonon, Nicolas Brisebarre, Elisa Riccietti et al.

This work introduces the first toolkit around path-norms that fully encompasses general DAG ReLU networks with biases, skip connections and any operation based on the extraction of order statistics: max pooling, GroupSort etc. This toolkit notably allows us to establish generalization bounds for modern neural networks that are not only the most widely applicable path-norm based ones, but also recover or beat the sharpest known bounds of this type. These extended path-norms further enjoy the usual benefits of path-norms: ease of computation, invariance under the symmetries of the network, and improved sharpness on layered fully-connected networks compared to the product of operator norms, another complexity measure most commonly used. The versatility of the toolkit and its ease of implementation allow us to challenge the concrete promises of path-norm-based generalization bounds, by numerically evaluating the sharpest known bounds for ResNets on ImageNet.

NAJan 15
Introduction to optimization methods for training SciML models

Alena Kopaničáková, Elisa Riccietti

Optimization is central to both modern machine learning (ML) and scientific machine learning (SciML), yet the structure of the underlying optimization problems differs substantially across these domains. Classical ML typically relies on stochastic, sample-separable objectives that favor first-order and adaptive gradient methods. In contrast, SciML often involves physics-informed or operator-constrained formulations in which differential operators induce global coupling, stiffness, and strong anisotropy in the loss landscape. As a result, optimization behavior in SciML is governed by the spectral properties of the underlying physical models rather than by data statistics, frequently limiting the effectiveness of standard stochastic methods and motivating deterministic or curvature-aware approaches. This document provides a unified introduction to optimization methods in ML and SciML, emphasizing how problem structure shapes algorithmic choices. We review first- and second-order optimization techniques in both deterministic and stochastic settings, discuss their adaptation to physics-constrained and data-driven SciML models, and illustrate practical strategies through tutorial examples, while highlighting open research directions at the interface of scientific computing and scientific machine learning.

LGMay 23, 2024
A Rescaling-Invariant Lipschitz Bound Based on Path-Metrics for Modern ReLU Network Parameterizations

Antoine Gonon, Nicolas Brisebarre, Elisa Riccietti et al.

Robustness with respect to weight perturbations underpins guarantees for generalization, pruning and quantization. Existing guarantees rely on Lipschitz bounds in parameter space, cover only plain feed-forward MLPs, and break under the ubiquitous neuron-wise rescaling symmetry of ReLU networks. We prove a new Lipschitz inequality expressed through the $\ell^1$-path-metric of the weights. The bound is (i) rescaling-invariant by construction and (ii) applies to any ReLU-DAG architecture with any combination of convolutions, skip connections, pooling, and frozen (inference-time) batch-normalization -- thus encompassing ResNets, U-Nets, VGG-style CNNs, and more. By respecting the network's natural symmetries, the new bound strictly sharpens prior parameter-space bounds and can be computed in two forward passes. To illustrate its utility, we derive from it a symmetry-aware pruning criterion and show -- through a proof-of-concept experiment on a ResNet-18 trained on ImageNet -- that its pruning performance matches that of classical magnitude pruning, while becoming totally immune to arbitrary neuron-wise rescalings.

LGMar 19, 2025
Mixed precision accumulation for neural network inference guided by componentwise forward error analysis

El-Mehdi El Arar, Silviu-Ioan Filip, Theo Mary et al.

This work proposes a mathematically founded mixed precision accumulation strategy for the inference of neural networks. Our strategy is based on a new componentwise forward error analysis that explains the propagation of errors in the forward pass of neural networks. Specifically, our analysis shows that the error in each component of the output of a layer is proportional to the condition number of the inner product between the weights and the input, multiplied by the condition number of the activation function. These condition numbers can vary widely from one component to the other, thus creating a significant opportunity to introduce mixed precision: each component should be accumulated in a precision inversely proportional to the product of these condition numbers. We propose a practical algorithm that exploits this observation: it first computes all components in low precision, uses this output to estimate the condition numbers, and recomputes in higher precision only the components associated with large condition numbers. We test our algorithm on various networks and datasets and confirm experimentally that it can significantly improve the cost--accuracy tradeoff compared with uniform precision accumulation baselines.

LGMay 23, 2023
A Block-Coordinate Approach of Multi-level Optimization with an Application to Physics-Informed Neural Networks

Serge Gratton, Valentin Mercier, Elisa Riccietti et al.

Multi-level methods are widely used for the solution of large-scale problems, because of their computational advantages and exploitation of the complementarity between the involved sub-problems. After a re-interpretation of multi-level methods from a block-coordinate point of view, we propose a multi-level algorithm for the solution of nonlinear optimization problems and analyze its evaluation complexity. We apply it to the solution of partial differential equations using physics-informed neural networks (PINNs) and show on a few test problems that the approach results in better solutions and significant computational savings

LGOct 4, 2021
Identifiability in Two-Layer Sparse Matrix Factorization

Léon Zheng, Elisa Riccietti, Rémi Gribonval

Sparse matrix factorization is the problem of approximating a matrix $\mathbf{Z}$ by a product of $J$ sparse factors $\mathbf{X}^{(J)} \mathbf{X}^{(J-1)} \ldots \mathbf{X}^{(1)}$. This paper focuses on identifiability issues that appear in this problem, in view of better understanding under which sparsity constraints the problem is well-posed. We give conditions under which the problem of factorizing a matrix into \emph{two} sparse factors admits a unique solution, up to unavoidable permutation and scaling equivalences. Our general framework considers an arbitrary family of prescribed sparsity patterns, allowing us to capture more structured notions of sparsity than simply the count of nonzero entries. These conditions are shown to be related to essential uniqueness of exact matrix decomposition into a sum of rank-one matrices, with structured sparsity constraints. In particular, in the case of fixed-support sparse matrix factorization, we give a general sufficient condition for identifiability based on rank-one matrix completability, and we derive from it a completion algorithm that can verify if this sufficient condition is satisfied, and recover the entries in the two sparse factors if this is the case. A companion paper further exploits these conditions to derive identifiability properties and theoretically sound factorization methods for multi-layer sparse matrix factorization with support constraints associated to some well-known fast transforms such as the Hadamard or the Discrete Fourier Transforms.

LGOct 4, 2021
Efficient Identification of Butterfly Sparse Matrix Factorizations

Léon Zheng, Elisa Riccietti, Rémi Gribonval

Fast transforms correspond to factorizations of the form $\mathbf{Z} = \mathbf{X}^{(1)} \ldots \mathbf{X}^{(J)}$, where each factor $ \mathbf{X}^{(\ell)}$ is sparse and possibly structured. This paper investigates essential uniqueness of such factorizations, i.e., uniqueness up to unavoidable scaling ambiguities. Our main contribution is to prove that any $N \times N$ matrix having the so-called butterfly structure admits an essentially unique factorization into $J$ butterfly factors (where $N = 2^{J}$), and that the factors can be recovered by a hierarchical factorization method, which consists in recursively factorizing the considered matrix into two factors. This hierarchical identifiability property relies on a simple identifiability condition in the two-layer and fixed-support setting. This approach contrasts with existing ones that fit the product of butterfly factors to a given matrix via gradient descent. The proposed method can be applied in particular to retrieve the factorization of the Hadamard or the discrete Fourier transform matrices of size $N=2^J$. Computing such factorizations costs $\mathcal{O}(N^{2})$, which is of the order of dense matrix-vector multiplication, while the obtained factorizations enable fast $\mathcal{O}(N \log N)$ matrix-vector multiplications and have the potential to be applied to compress deep neural networks.

NAApr 9, 2019
On the approximation of the solution of partial differential equations by artificial neural networks trained by a multilevel Levenberg-Marquardt method

Henri Calandra, Serge Gratton, Elisa Riccietti et al.

This paper is concerned with the approximation of the solution of partial differential equations by means of artificial neural networks. Here a feedforward neural network is used to approximate the solution of the partial differential equation. The learning problem is formulated as a least squares problem, choosing the residual of the partial differential equation as a loss function, whereas a multilevel Levenberg-Marquardt method is employed as a training method. This setting allows us to get further insight into the potential of multilevel methods. Indeed, when the least squares problem arises from the training of artificial neural networks, the variables subject to optimization are not related by any geometrical constraints and the standard interpolation and restriction operators cannot be employed any longer. A heuristic, inspired by algebraic multigrid methods, is then proposed to construct the multilevel transfer operators. Numerical experiments show encouraging results related to the efficiency of the new multilevel optimization method for the training of artificial neural networks, compared to the standard corresponding one-level procedure.

NAApr 9, 2019
On high-order multilevel optimization strategies

Henri Calandra, Serge Gratton, Elisa Riccietti et al.

We propose a new family of multilevel methods for unconstrained minimization. The resulting strategies are multilevel extensions of high-order optimization methods based on q-order Taylor models (with q >= 1) that have been recently proposed in the literature. The use of high-order models, while decreasing the worst-case complexity bound, makes these methods computationally more expensive. Hence, to counteract this effect, we propose a multilevel strategy that exploits a hierarchy of problems of decreasing dimension, still approximating the original one, to reduce the global cost of the step computation. A theoretical analysis of the family of methods is proposed. Specifically, local and global convergence results are proved and a complexity bound to reach first order stationary points is also derived. A multilevel version of the well known adaptive method based on cubic regularization (ARC, corresponding to q = 2 in our setting) has been implemented. Numerical experiments clearly highlight the relevance of the new multilevel approach leading to considerable computational savings in terms of floating point operations compared to the classical one-level strategy.