LGFeb 2, 2023Code
Convolutional Neural Operators for robust and accurate learning of PDEsBogdan Raonić, Roberto Molinaro, Tim De Ryck et al.
Although very successfully used in conventional machine learning, convolution based neural network architectures -- believed to be inconsistent in function space -- have been largely ignored in the context of learning solution operators of PDEs. Here, we present novel adaptations for convolutional neural networks to demonstrate that they are indeed able to process functions as inputs and outputs. The resulting architecture, termed as convolutional neural operators (CNOs), is designed specifically to preserve its underlying continuous nature, even when implemented in a discretized form on a computer. We prove a universality theorem to show that CNOs can approximate operators arising in PDEs to desired accuracy. CNOs are tested on a novel suite of benchmarks, encompassing a diverse set of PDEs with possibly multi-scale solutions and are observed to significantly outperform baselines, paving the way for an alternative framework for robust and accurate operator learning. Our code is publicly available at https://github.com/bogdanraonic3/ConvolutionalNeuralOperator
LGSep 29, 2023
Module-wise Training of Neural Networks via the Minimizing Movement SchemeSkander Karkar, Ibrahim Ayed, Emmanuel de Bézenac et al.
Greedy layer-wise or module-wise training of neural networks is compelling in constrained and on-device settings where memory is limited, as it circumvents a number of problems of end-to-end back-propagation. However, it suffers from a stagnation problem, whereby early layers overfit and deeper layers stop increasing the test accuracy after a certain depth. We propose to solve this issue by introducing a module-wise regularization inspired by the minimizing movement scheme for gradient flows in distribution space. We call the method TRGL for Transport Regularized Greedy Learning and study it theoretically, proving that it leads to greedy modules that are regular and that progressively solve the task. Experimentally, we show improved accuracy of module-wise training of various architectures such as ResNets, Transformers and VGG, when our regularization is added, superior to that of other module-wise training methods and often to end-to-end training, with as much as 60% less memory usage.
LGOct 3, 2022
Block-wise Training of Residual Networks via the Minimizing Movement SchemeSkander Karkar, Ibrahim Ayed, Emmanuel de Bézenac et al.
End-to-end backpropagation has a few shortcomings: it requires loading the entire model during training, which can be impossible in constrained settings, and suffers from three locking problems (forward locking, update locking and backward locking), which prohibit training the layers in parallel. Solving layer-wise optimization problems can address these problems and has been used in on-device training of neural networks. We develop a layer-wise training method, particularly welladapted to ResNets, inspired by the minimizing movement scheme for gradient flows in distribution space. The method amounts to a kinetic energy regularization of each block that makes the blocks optimal transport maps and endows them with regularity. It works by alleviating the stagnation problem observed in layer-wise training, whereby greedily-trained early layers overfit and deeper layers stop increasing test accuracy after a certain depth. We show on classification tasks that the test accuracy of block-wise trained ResNets is improved when using our method, whether the blocks are trained sequentially or in parallel.
LGOct 9, 2023
An operator preconditioning perspective on training in physics-informed machine learningTim De Ryck, Florent Bonnet, Siddhartha Mishra et al.
In this paper, we investigate the behavior of gradient descent algorithms in physics-informed machine learning methods like PINNs, which minimize residuals connected to partial differential equations (PDEs). Our key result is that the difficulty in training these models is closely related to the conditioning of a specific differential operator. This operator, in turn, is associated to the Hermitian square of the differential operator of the underlying PDE. If this operator is ill-conditioned, it results in slow or infeasible training. Therefore, preconditioning this operator is crucial. We employ both rigorous mathematical analysis and empirical evaluations to investigate various strategies, explaining how they better condition this critical operator, and consequently improve training.
LGJun 8, 2021Code
LEADS: Learning Dynamical Systems that Generalize Across EnvironmentsYuan Yin, Ibrahim Ayed, Emmanuel de Bézenac et al.
When modeling dynamical systems from real-world data samples, the distribution of data often changes according to the environment in which they are captured, and the dynamics of the system itself vary from one environment to another. Generalizing across environments thus challenges the conventional frameworks. The classical settings suggest either considering data as i.i.d. and learning a single model to cover all situations or learning environment-specific models. Both are sub-optimal: the former disregards the discrepancies between environments leading to biased solutions, while the latter does not exploit their potential commonalities and is prone to scarcity problems. We propose LEADS, a novel framework that leverages the commonalities and discrepancies among known environments to improve model generalization. This is achieved with a tailored training formulation aiming at capturing common dynamics within a shared model while additional terms capture environment-specific dynamics. We ground our approach in theory, exhibiting a decrease in sample complexity with our approach and corroborate these results empirically, instantiating it for linear dynamics. Moreover, we concretize this framework for neural networks and evaluate it experimentally on representative families of nonlinear dynamics. We show that this new setting can exploit knowledge extracted from environment-dependent data and improves generalization for both known and novel environments. Code is available at https://github.com/yuan-yin/LEADS.
MLOct 9, 2020Code
Augmenting Physical Models with Deep Networks for Complex Dynamics ForecastingYuan Yin, Vincent Le Guen, Jérémie Dona et al.
Forecasting complex dynamical phenomena in settings where only partial knowledge of their dynamics is available is a prevalent problem across various scientific fields. While purely data-driven approaches are arguably insufficient in this context, standard physical modeling based approaches tend to be over-simplistic, inducing non-negligible errors. In this work, we introduce the APHYNITY framework, a principled approach for augmenting incomplete physical dynamics described by differential equations with deep data-driven models. It consists in decomposing the dynamics into two components: a physical component accounting for the dynamics for which we have some prior knowledge, and a data-driven component accounting for errors of the physical model. The learning problem is carefully formulated such that the physical model explains as much of the data as possible, while the data-driven component only describes information that cannot be captured by the physical model, no more, no less. This not only provides the existence and uniqueness for this decomposition, but also ensures interpretability and benefits generalization. Experiments made on three important use cases, each representative of a different family of phenomena, i.e. reaction-diffusion equations, wave equations and the non-linear damped pendulum, show that APHYNITY can efficiently leverage approximate physical models to accurately forecast the evolution of the system and correctly identify relevant physical parameters. Code is available at https://github.com/yuan-yin/APHYNITY .
LGMay 31, 2023
Representation Equivalent Neural Operators: a Framework for Alias-free Operator LearningFrancesca Bartolucci, Emmanuel de Bézenac, Bogdan Raonić et al.
Recently, operator learning, or learning mappings between infinite-dimensional function spaces, has garnered significant attention, notably in relation to learning partial differential equations from data. Conceptually clear when outlined on paper, neural operators necessitate discretization in the transition to computer implementations. This step can compromise their integrity, often causing them to deviate from the underlying operators. This research offers a fresh take on neural operators with a framework Representation equivalent Neural Operators (ReNO) designed to address these issues. At its core is the concept of operator aliasing, which measures inconsistency between neural operators and their discrete representations. We explore this for widely-used operator learning techniques. Our findings detail how aliasing introduces errors when handling different discretizations and grids and loss of crucial continuous structures. More generally, this framework not only sheds light on existing challenges but, given its constructive and broad nature, also potentially offers tools for developing new neural operators.
LGMay 25, 2023
Unifying GANs and Score-Based Diffusion as Generative Particle ModelsJean-Yves Franceschi, Mike Gartrell, Ludovic Dos Santos et al.
Particle-based deep generative models, such as gradient flows and score-based diffusion models, have recently gained traction thanks to their striking performance. Their principle of displacing particle distributions using differential equations is conventionally seen as opposed to the previously widespread generative adversarial networks (GANs), which involve training a pushforward generator network. In this paper we challenge this interpretation, and propose a novel framework that unifies particle and adversarial generative models by framing generator training as a generalization of particle models. This suggests that a generator is an optional addition to any such generative model. Consequently, integrating a generator into a score-based diffusion model and training a GAN without a generator naturally emerge from our framework. We empirically test the viability of these original models as proofs of concepts of potential applications of our framework.
LGJun 10, 2021
A Neural Tangent Kernel Perspective of GANsJean-Yves Franceschi, Emmanuel de Bézenac, Ibrahim Ayed et al.
We propose a novel theoretical framework of analysis for Generative Adversarial Networks (GANs). We reveal a fundamental flaw of previous analyses which, by incorrectly modeling GANs' training scheme, are subject to ill-defined discriminator gradients. We overcome this issue which impedes a principled study of GAN training, solving it within our framework by taking into account the discriminator's architecture. To this end, we leverage the theory of infinite-width neural networks for the discriminator via its Neural Tangent Kernel. We characterize the trained discriminator for a wide range of losses and establish general differentiability properties of the network. From this, we derive new insights about the convergence of the generated distribution, advancing our understanding of GANs' training dynamics. We empirically corroborate these results via an analysis toolkit based on our framework, unveiling intuitions that are consistent with GAN practice.
MLSep 17, 2020
A Principle of Least Action for the Training of Neural NetworksSkander Karkar, Ibrahim Ayed, Emmanuel de Bézenac et al.
Neural networks have been achieving high generalization performance on many tasks despite being highly over-parameterized. Since classical statistical learning theory struggles to explain this behavior, much effort has recently been focused on uncovering the mechanisms behind it, in the hope of developing a more adequate theoretical framework and having a better control over the trained models. In this work, we adopt an alternate perspective, viewing the neural network as a dynamical system displacing input particles over time. We conduct a series of experiments and, by analyzing the network's behavior through its displacements, we show the presence of a low kinetic energy displacement bias in the transport map of the network, and link this bias with generalization performance. From this observation, we reformulate the learning problem as follows: finding neural networks which solve the task while transporting the data as efficiently as possible. This offers a novel formulation of the learning problem which allows us to provide regularity results for the solution network, based on Optimal Transport theory. From a practical viewpoint, this allows us to propose a new learning algorithm, which automatically adapts to the complexity of the given task, and leads to networks with a high generalization ability even in low data regimes.
LGJun 4, 2019
Optimal Unsupervised Domain TranslationEmmanuel de Bézenac, Ibrahim Ayed, Patrick Gallinari
Domain Translation is the problem of finding a meaningful correspondence between two domains. Since in a majority of settings paired supervision is not available, much work focuses on Unsupervised Domain Translation (UDT) where data samples from each domain are unpaired. Following the seminal work of CycleGAN for UDT, many variants and extensions of this model have been proposed. However, there is still little theoretical understanding behind their success. We observe that these methods yield solutions which are approximately minimal w.r.t. a given transportation cost, leading us to reformulate the problem in the Optimal Transport (OT) framework. This viewpoint gives us a new perspective on Unsupervised Domain Translation and allows us to prove the existence and uniqueness of the retrieved mapping, given a large family of transport costs. We then propose a novel framework to efficiently compute optimal mappings in a dynamical setting. We show that it generalizes previous methods and enables a more explicit control over the computed optimal mapping. It also provides smooth interpolations between the two domains. Experiments on toy and real world datasets illustrate the behavior of our method.
SYFeb 26, 2019
Learning Dynamical Systems from Partial ObservationsIbrahim Ayed, Emmanuel de Bézenac, Arthur Pajot et al.
We consider the problem of forecasting complex, nonlinear space-time processes when observations provide only partial information of on the system's state. We propose a natural data-driven framework, where the system's dynamics are modelled by an unknown time-varying differential equation, and the evolution term is estimated from the data, using a neural network. Any future state can then be computed by placing the associated differential equation in an ODE solver. We first evaluate our approach on shallow water and Euler simulations. We find that our method not only demonstrates high quality long-term forecasts, but also learns to produce hidden states closely resembling the true states of the system, without direct supervision on the latter. Additional experiments conducted on challenging, state of the art ocean simulations further validate our findings, while exhibiting notable improvements over classical baselines.