Enrique Zuazua

LG
h-index9
33papers
250citations
Novelty46%
AI Score53

33 Papers

CLASS-PHJan 13, 2014
Generation of two-dimensional water waves by moving bottom disturbances

Hayk Nersisyan, Denys Dutykh, Enrique Zuazua

We investigate the potential and limitations of the wave generation by disturbances moving at the bottom. More precisely, we assume that the wavemaker is composed of an underwater object of a given shape which can be displaced according to a prescribed trajectory. We address the practical question of computing the wavemaker shape and trajectory generating a wave with prescribed characteristics. For the sake of simplicity we model the hydrodynamics by a generalized forced Benjamin-Bona-Mahony (BBM) equation. This practical problem is reformulated as a constrained nonlinear optimization problem. Additional constraints are imposed in order to fulfill various practical design requirements. Finally, we present some numerical results in order to demonstrate the feasibility and performance of the proposed methodology.

23.5NAJun 1
The Coercivity Gap in Neural PDE Solvers: Parameter Escape and Functional Convergence

Enrique Zuazua

We study neural approximation of elliptic PDE solutions from a variational perspective. The central point is the distinction between the geometry of neural parameters and the convergence of the corresponding physical states. Even when the original elliptic energy is coercive and strictly convex in the natural energy space, its restriction to a nonlinear neural ansatz may fail to be coercive in parameter space. This failure is caused by non-closedness of neural approximation manifolds and by condensation of neurons, which may generate limiting profiles outside the fixed ansatz class. Nevertheless, the associated state functions may remain bounded and converge strongly to the exact PDE solution. We prove this mechanism for Gaussian wave-packet approximations of a prototypical elliptic model in the whole space, derive convergence rates, and explain how the same state-level stability principle applies to residual minimization methods of PINN type, and HYCO-type hybrid methods. We also discuss relaxation and Tikhonov regularization.

APAug 1, 2010
Localized solutions for the finite difference semi-discretization of the wave equation

Aurora-Mihaela Marica, Enrique Zuazua

We study the propagation properties of the solutions of the finite-difference space semi-discrete wave equation on an uniform grid of the whole Euclidean space. We provide a construction of high frequency wave packets that propagate along the corresponding bi-characteristic rays of Geometric Optics with a group velocity arbitrarily close to zero. Our analysis is motivated by control theoretical issues. In particular, the continuous wave equation has the so-called observability property: for a sufficiently large time, the total energy of its solutions can be estimated in terms of the energy concentrated in the exterior of a compact set. This fails to be true, uniformly on the mesh-size parameter, for the semi-discrete schemes and the observability constant blows-up at an arbitrarily large polynomial order. Our contribution consists in providing a rigorous derivation of those wave packets and in analyzing their behavior near that ray, by taking into account the subtle added dispersive effects that the numerical scheme introduces.

OCDec 15, 2015
Optimal strategies for driving a mobile agent in a guidance by repulsion model

Ramón Escobedo, Aitziber Ibañez, Enrique Zuazua

We present a guidance by repulsion model based on a driver-evader interaction where the driver, assumed to be faster than the evader, follows the evader but cannot be arbitrarily close to it, and the evader tries to move away from the driver beyond a short distance. The key ingredient allowing the driver to guide the evader is that the driver is able to display a circumvention maneuver around the evader, in such a way that the trajectory of the evader is modified in the direction of the repulsion that the driver exerts on the evader. The evader can thus be driven towards any given target or along a sufficiently smooth path by controlling a single discrete parameter acting on driver's behavior. The control parameter serves both to activate/deactivate the circumvention mode and to select the clockwise/counterclockwise direction of the circumvention maneuver. Assuming that the circumvention mode is more expensive than the pursuit mode, and that the activation of the circumvention mode has a high cost, we formulate an optimal control problem for the optimal strategy to drive the evader to a given target. By means of numerical shooting methods, we find the optimal open-loop control which reduces the number of activations of the circumvention mode to one and which minimizes the time spent in the active~mode. Our numerical simulations show that the system is highly sensitive to small variations of the control function, and that the cost function has a nonlinear regime which contributes to the complexity of the behavior of the system, so that a general open-loop control would not be of practical interest. We then propose a feedback control law that corrects from deviations while preventing from an excesive use of the circumvention mode, finding numerically that the feedback law significantly reduces the cost obtained with the open-loop control.

APJan 18, 2011
Approximating travelling waves by equilibria of non local equations

Jose M. Arrieta, Maria Lopez-Fernandez, Enrique Zuazua

We consider an evolution equation of parabolic type in R having a travelling wave solution. We perform an appropriate change of variables which transforms the equation into a non local evolution one having a travelling wave solution with zero speed of propagation with exactly the same profile as the original one. We analyze the relation of the new equation with the original one in the entire real line. We also analyze the behavior of the non local problem in a bounded interval with appropriate boundary conditions and show that it has a unique stationary solution which is asymptotically stable for large enough intervals and that converges to the travelling wave as the interval approaches the entire real line. This procedure allows to compute simultaneously the travelling wave profile and its propagation speed avoiding moving meshes, as we illustrate with several numerical examples.

APAug 1, 2010
High frequency wave packets for the Schrödinger equation and its numerical approximations

Aurora-Mihaela Marica, Enrique Zuazua

We build Gaussian wave packets for the linear Schrödinger equation and its finite difference space semi-discretization and illustrate the lack of uniform dispersive properties of the numerical solutions as established in Ignat, Zuazua, Numerical dispersive schemes for the nonlinear Schrödinger equation, SIAM. J. Numer. Anal., 47(2) (2009), 1366-1390. It is by now well known that bigrid algorithms provide filtering mechanisms allowing to recover the uniformity of the dispersive properties as the mesh size goes to zero. We analyze and illustrate numerically how these high frequency wave packets split and propagate under these bigrid filtering mechanisms, depending on how the fine grid/coarse grid filtering is implemented.

APAug 1, 2010
Localized solutions and filtering mechanisms for the discontinuous Galerkin semi-discretizations of the 1-d wave equation

Aurora-Mihaela Marica, Enrique Zuazua

We perform a complete Fourier analysis of the semi-discrete 1-d wave equation obtained through a P1 discontinuous Galerkin (DG) approximation of the continuous wave equation on an uniform grid. The resulting system exhibits the interaction of two types of components: a physical one and a spurious one, related to the possible discontinuities that the numerical solution allows. Each dispersion relation contains critical points where the corresponding group velocity vanishes. Following previous constructions, we rigorously build wave packets with arbitrarily small velocity of propagation concentrated either on the physical or on the spurious component. We also develop filtering mechanisms aimed at recovering the uniform velocity of propagation of the continuous solutions. Finally, some applications to numerical approximation issues of control problems are also presented.

29.4NAMay 12
Optimal convergence rates for the finite element approximation of the Sobolev constant

Liviu I. Ignat, Enrique Zuazua

We establish optimal convergence rates for the P1 finite element approximation of the Sobolev constant in arbitrary dimensions N\geq 2 and for Lebesgue exponents 1<p<N. Our analysis relies on a refined study of the Sobolev deficit in suitable quasi-norms, which have been introduced and utilized in the context of finite element approximations of the p- Laplacian. The proof further involves sharp estimates for the finite element approximation of Sobolev minimizers.

LGAug 13, 2023
Approximate and Weighted Data Reconstruction Attack in Federated Learning

Yongcun Song, Ziqi Wang, Enrique Zuazua

Federated Learning (FL) is a distributed learning paradigm that enables multiple clients to collaborate on building a machine learning model without sharing their private data. Although FL is considered privacy-preserved by design, recent data reconstruction attacks demonstrate that an attacker can recover clients' training data based on the parameters shared in FL. However, most existing methods fail to attack the most widely used horizontal Federated Averaging (FedAvg) scenario, where clients share model parameters after multiple local training steps. To tackle this issue, we propose an interpolation-based approximation method, which makes attacking FedAvg scenarios feasible by generating the intermediate model updates of the clients' local training processes. Then, we design a layer-wise weighted loss function to improve the data quality of reconstruction. We assign different weights to model updates in different layers concerning the neural network structure, with the weights tuned by Bayesian optimization. Finally, experimental results validate the superiority of our proposed approximate and weighted attack (AWA) method over the other state-of-the-art methods, as demonstrated by the substantial improvement in different evaluation metrics for image data reconstructions.

36.3NAApr 13
Computational performance of the MMOC in the inverse design of the Doswell frontogenesis equation

Alexandre Francisco, Umberto Biccari, Enrique Zuazua

Inverse design of transport equations can be addressed by using a gradient-adjoint methodology. In this methodology numerical schemes used for the adjoint resolution determine the direction of descent in its iterative algorithm, and consequently the CPU time consumed by the inverse design. As the CPU time constitutes a known bottleneck, it is important to employ light and quick schemes to the adjoint problem. In this regard, we proposed to use the Modified Method of Characteristics (MMOC). Despite not preserving identity conservation, the MMOC is computationally competitive. In this work we investigated the advantage of using the MMOC in comparison with the Lax-Friedrichs and Lax-Wendro? schemes for the inverse design problem. By testing the Doswell frontogenesis equation, we observed that the MMOC can provide more efficient and accurate computation under some simulation conditions.

NANov 17, 2011
Convergence rates for dispersive approximation schemes to nonlinear Schrödinger equations

Liviu Ignat, Enrique Zuazua

This article is devoted to the analysis of the convergence rates of several nu- merical approximation schemes for linear and nonlinear Schrödinger equations on the real line. Recently, the authors have introduced viscous and two-grid numerical approximation schemes that mimic at the discrete level the so-called Strichartz dispersive estimates of the continuous Schrödinger equation. This allows to guarantee the convergence of numerical approximations for initial data in L2(R), a fact that can not be proved in the nonlinear setting for standard conservative schemes unless more regularity of the initial data is assumed. In the present article we obtain explicit convergence rates and prove that dispersive schemes fulfilling the Strichartz estimates are better behaved for Hs(R) data if 0 < s < 1/2. Indeed, while dispersive schemes ensure a polynomial convergence rate, non-dispersive ones only yield logarithmic decay rates.

8.0OCMay 12
HYCO: A Formalism for Hybrid-Cooperative PDE Modelling

Lorenzo Liverani, Enrique Zuazua

We present Hybrid-Cooperative Learning (HYCO), a hybrid modeling framework that integrates physics-based and data-driven models through mutual regularization. Unlike traditional approaches that impose physical constraints directly on synthetic models, HYCO treats both components as co-trained agents nudged toward agreement. This cooperative scheme is naturally parallelizable and demonstrates robustness to sparse and noisy data. Numerical experiments on static and time-dependent benchmark problems show that HYCO can recover accurate solutions and model parameters under ill-posed conditions. The framework admits a game-theoretic interpretation as a Nash equilibrium problem, enabling alternating optimization. This paper is based on the extended preprint: arXiv:2509.14123 .

8.0CRApr 21
Sherpa.ai Privacy-Preserving Multi-Party Entity Alignment without Intersection Disclosure for Noisy Identifiers

Daniel M. Jimenez-Gutierrez, Enrique Zuazua, Georgios Kellaris et al.

Federated Learning (FL) enables collaborative model training among multiple parties without centralizing raw data. There are two main paradigms in FL: Horizontal FL (HFL), where all participants share the same feature space but hold different samples, and Vertical FL (VFL), where parties possess complementary features for the same set of samples. A prerequisite for VFL training is privacy-preserving entity alignment (PPEA), which establishes a common index of samples across parties (alignment) without revealing which samples are shared between them. Conventional private set intersection (PSI) achieves alignment but leaks intersection membership, exposing sensitive relationships between datasets. The standard private set union (PSU) mitigates this risk by aligning on the union of identifiers rather than the intersection. However, existing approaches are often limited to two parties or lack support for typo-tolerant matching. In this paper, we introduce the Sherpa.ai multi-party PSU protocol for VFL, a PPEA method that hides intersection membership and enables both exact and noisy matching. The protocol generalizes two-party approaches to multiple parties with low communication overhead and offers two variants: an order-preserving version for exact alignment and an unordered version tolerant to typographical and formatting discrepancies. We prove correctness and privacy, analyze communication and computational (exponentiation) complexity, and formalize a universal index mapping from local records to a shared index space. This multi-party PSU offers a scalable, mathematically grounded protocol for PPEA in real-world VFL deployments, such as multi-institutional healthcare disease detection, collaborative risk modeling between banks and insurers, and cross-domain fraud detection between telecommunications and financial institutions, while preserving intersection privacy.

35.0NAMay 6
Hamiltonian Interface Dynamics for Reduced-Order Optimization of Incompressible Mixing

Ziqian Li, Enrique Zuazua

We develop a reduced-order framework for optimizing mixing in two-dimensional incompressible flows. Instead of optimizing the full transport PDE, the method maximizes the length of advected material interfaces, leading to a finite-dimensional Hamiltonian control problem based on parametrized stream functions. We derive the continuous adjoint equations and reduced gradients, and discretize the forward and adjoint dynamics with the implicit midpoint rule. The resulting discrete adjoint is algebraically consistent with the derivative of the fully discrete objective, up to the tolerance of the nonlinear midpoint solves. The approach applies to bounded two-dimensional domains with smooth finite-dimensional stream-function parametrizations. Numerical experiments on cellular-flow and Doswell frontogenesis benchmarks show that the optimized time-dependent Hamiltonians generate near-exponential interface stretching and substantially faster decay of the $\dot{H}^{-1}$ mix-norm, in contrast with the polynomial behavior observed for stationary flows. When evaluated on a common reference transport solver, the interface-based controls produce faster $\dot{H}^{-1}$ decay than a Eulerian Sobolev-norm optimizer under a matched setup, while substantially reducing computational cost. We also identify a limitation of the reduced model: increasing the control basis may further improve the interface-length objective without yielding proportional gains in $\dot{H}^{-1}$ mixing, confirming that interface length is an effective but not fully faithful proxy for mixing in geometrically complex regimes.

LGDec 18, 2025
Training Together, Diagnosing Better: Federated Learning for Collagen VI-Related Dystrophies

Astrid Brull, Sara Aguti, Véronique Bolduc et al.

The application of Machine Learning (ML) to the diagnosis of rare diseases, such as collagen VI-related dystrophies (COL6-RD), is fundamentally limited by the scarcity and fragmentation of available data. Attempts to expand sampling across hospitals, institutions, or countries with differing regulations face severe privacy, regulatory, and logistical obstacles that are often difficult to overcome. The Federated Learning (FL) provides a promising solution by enabling collaborative model training across decentralized datasets while keeping patient data local and private. Here, we report a novel global FL initiative using the Sherpa.ai FL platform, which leverages FL across distributed datasets in two international organizations for the diagnosis of COL6-RD, using collagen VI immunofluorescence microscopy images from patient-derived fibroblast cultures. Our solution resulted in an ML model capable of classifying collagen VI patient images into the three primary pathogenic mechanism groups associated with COL6-RD: exon skipping, glycine substitution, and pseudoexon insertion. This new approach achieved an F1-score of 0.82, outperforming single-organization models (0.57-0.75). These results demonstrate that FL substantially improves diagnostic utility and generalizability compared to isolated institutional models. Beyond enabling more accurate diagnosis, we anticipate that this approach will support the interpretation of variants of uncertain significance and guide the prioritization of sequencing strategies to identify novel pathogenic variants.

35.4LGMay 13
Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning

Daniel M. Jimenez-Gutierrez, Enrique Zuazua, Georgios Kellaris et al.

The recent success of large language models (LLMs) has been largely driven by vast public datasets. However, the next frontier for LLM development lies beyond public data. Much of the world's most valuable information is private, especially in highly regulated sectors such as healthcare and finance, where data include patient histories or customer communications. Unlocking this data could represent a major leap forward, enabling LLMs with deeper domain expertise and stronger real-world utility. Yet, these data cannot be shared because they are distributed across institutions and constrained by privacy, regulatory, and organizational barriers. Moreover, institutional datasets are typically non-independent and identically distributed (non-IID), differing across sites in population characteristics, data modalities, documentation patterns, and task-specific label distributions. In this paper, we demonstrate a practical approach to unlocking private and distributed institutional data for LLM adaptation through federated collaboration across data silos. Built on the Sherpa.ai Federated Learning platform, our framework enables nodes to jointly fine-tune a shared LLM without exchanging private data. We evaluate this approach through a cross-domain benchmark in healthcare and finance, using four closed-ended question answering and classification datasets: MedQA, MedMCQA, FPB, and FiQA-SA. We compare three parameter-efficient fine-tuning (PEFT) strategies-LoRA, QLoRA, and IA3-across pretrained backbones under non-IID settings reflecting institutional data heterogeneity. Our results show that federated fine-tuning performs close to centralized training and outperforms isolated single-institution learning. From a Green AI perspective, QLoRA and IA3 improve efficiency with limited accuracy degradation, supporting federated PEFT as a viable approach for adapting LLMs where data cannot be shared.

MLSep 10, 2024
Constructive Universal Approximation and Finite Sample Memorization by Narrow Deep ReLU Networks

Martín Hernández, Enrique Zuazua

We present a fully constructive analysis of deep ReLU neural networks for classification and function approximation tasks. First, we prove that any dataset with $N$ distinct points in $\mathbb{R}^d$ and $M$ output classes can be exactly classified using a multilayer perceptron (MLP) of width $2$ and depth at most $2N + 4M - 1$, with all network parameters constructed explicitly. This result is sharp with respect to width and is interpreted through the lens of simultaneous or ensemble controllability in discrete nonlinear dynamics. Second, we show that these explicit constructions yield uniform bounds on the parameter norms and, in particular, provide upper estimates for minimizers of standard regularized training loss functionals in supervised learning. As the regularization parameter vanishes, the trained networks converge to exact classifiers with bounded norm, explaining the effectiveness of overparameterized training in the small-regularization regime. We also prove a universal approximation theorem in $L^p(Ω; \mathbb{R}_+)$ for any bounded domain $Ω\subset \mathbb{R}^d$ and $p \in [1, \infty)$, using MLPs of fixed width $d + 1$. The proof is constructive, geometrically motivated, and provides explicit estimates on the network depth when the target function belongs to the Sobolev space $W^{1,p}$. We also extend the approximation and depth estimation results to $L^p(Ω; \mathbb{R}^m)$ for any $m \geq 1$. Our results offer a unified and interpretable framework connecting controllability, expressivity, and training dynamics in deep neural networks.

OCNov 8, 2025
A PDE Perspective on Generative Diffusion Models

Kang Liu, Enrique Zuazua

Score-based diffusion models have emerged as a powerful class of generative methods, achieving state-of-the-art performance across diverse domains. Despite their empirical success, the mathematical foundations of those models remain only partially understood, particularly regarding the stability and consistency of the underlying stochastic and partial differential equations governing their dynamics. In this work, we develop a rigorous partial differential equation (PDE) framework for score-based diffusion processes. Building on the Li--Yau differential inequality for the heat flow, we prove well-posedness and derive sharp $L^p$-stability estimates for the associated score-based Fokker--Planck dynamics, providing a mathematically consistent description of their temporal evolution. Through entropy stability methods, we further show that the reverse-time dynamics of diffusion models concentrate on the data manifold for compactly supported data distributions and a broad class of initialization schemes, with a concentration rate of order $\sqrt{t}$ as $t \to 0$. These results yield a theoretical guarantee that, under exact score guidance, diffusion trajectories return to the data manifold while preserving imitation fidelity. Our findings also provide practical insights for designing diffusion models, including principled criteria for score-function construction, loss formulation, and stopping-time selection. Altogether, this framework provides a quantitative understanding of the trade-off between generative capacity and imitation fidelity, bridging rigorous analysis and model design within a unified mathematical perspective.

LGFeb 26
Fair feature attribution for multi-output prediction: a Shapley-based perspective

Umberto Biccari, Alain Ibáñez de Opakua, José María Mato et al.

In this article, we provide an axiomatic characterization of feature attribution for multi-output predictors within the Shapley framework. While SHAP explanations are routinely computed independently for each output coordinate, the theoretical necessity of this practice has remained unclear. By extending the classical Shapley axioms to vector-valued cooperative games, we establish a rigidity theorem showing that any attribution rule satisfying efficiency, symmetry, dummy player, and additivity must necessarily decompose component-wise across outputs. Consequently, any joint-output attribution rule must relax at least one of the classical Shapley axioms. This result identifies a previously unformalized structural constraint in Shapley-based interpretability, clarifying the precise scope of fairness-consistent explanations in multi-output learning. Numerical experiments on a biomedical benchmark illustrate that multi-output models can yield computational savings in training and deployment, while producing SHAP explanations that remain fully consistent with the component-wise structure imposed by the Shapley axioms.

LGFeb 21, 2024
FedADMM-InSa: An Inexact and Self-Adaptive ADMM for Federated Learning

Yongcun Song, Ziqi Wang, Enrique Zuazua

Federated learning (FL) is a promising framework for learning from distributed data while maintaining privacy. The development of efficient FL algorithms encounters various challenges, including heterogeneous data and systems, limited communication capacities, and constrained local computational resources. Recently developed FedADMM methods show great resilience to both data and system heterogeneity. However, they still suffer from performance deterioration if the hyperparameters are not carefully tuned. To address this issue, we propose an inexact and self-adaptive FedADMM algorithm, termed FedADMM-InSa. First, we design an inexactness criterion for the clients' local updates to eliminate the need for empirically setting the local training accuracy. This inexactness criterion can be assessed by each client independently based on its unique condition, thereby reducing the local computational cost and mitigating the undesirable straggle effect. The convergence of the resulting inexact ADMM is proved under the assumption of strongly convex loss functions. Additionally, we present a self-adaptive scheme that dynamically adjusts each client's penalty parameter, enhancing algorithm robustness by mitigating the need for empirical penalty parameter choices for each client. Extensive numerical experiments on both synthetic and real-world datasets are conducted. As validated by some numerical tests, our proposed algorithm can reduce the clients' local computational load significantly and also accelerate the learning process compared to the vanilla FedADMM.

LGNov 12, 2025
Federated Learning for Pediatric Pneumonia Detection: Enabling Collaborative Diagnosis Without Sharing Patient Data

Daniel M. Jimenez-Gutierrez, Enrique Zuazua, Joaquin Del Rio et al.

Early and accurate pneumonia detection from chest X-rays (CXRs) is clinically critical to expedite treatment and isolation, reduce complications, and curb unnecessary antibiotic use. Although artificial intelligence (AI) substantially improves CXR-based detection, development is hindered by globally distributed data, high inter-hospital variability, and strict privacy regulations (e.g., HIPAA, GDPR) that make centralization impractical. These constraints are compounded by heterogeneous imaging protocols, uneven data availability, and the costs of transferring large medical images across geographically dispersed sites. In this paper, we evaluate Federated Learning (FL) using the Sherpa.ai FL platform, enabling multiple hospitals (nodes) to collaboratively train a CXR classifier for pneumonia while keeping data in place and private. Using the Pediatric Pneumonia Chest X-ray dataset, we simulate cross-hospital collaboration with non-independent and non-identically distributed (non-IID) data, reproducing real-world variability across institutions and jurisdictions. Our experiments demonstrate that collaborative and privacy-preserving training across multiple hospitals via FL led to a dramatic performance improvement achieving 0.900 Accuracy and 0.966 ROC-AUC, corresponding to 47.5% and 50.0% gains over single-hospital models (0.610; 0.644), without transferring any patient CXR. These results indicate that FL delivers high-performing, generalizable, secure and private pneumonia detection across healthcare networks, with data kept local. This is especially relevant for rare diseases, where FL enables secure multi-institutional collaboration without data movement, representing a breakthrough for accelerating diagnosis and treatment development in low-data domains.

CRNov 3, 2025
Federated Cyber Defense: Privacy-Preserving Ransomware Detection Across Distributed Systems

Daniel M. Jimenez-Gutierrez, Enrique Zuazua, Joaquin Del Rio et al.

Detecting malware, especially ransomware, is essential to securing today's interconnected ecosystems, including cloud storage, enterprise file-sharing, and database services. Training high-performing artificial intelligence (AI) detectors requires diverse datasets, which are often distributed across multiple organizations, making centralization necessary. However, centralized learning is often impractical due to security, privacy regulations, data ownership issues, and legal barriers to cross-organizational sharing. Compounding this challenge, ransomware evolves rapidly, demanding models that are both robust and adaptable. In this paper, we evaluate Federated Learning (FL) using the Sherpa.ai FL platform, which enables multiple organizations to collaboratively train a ransomware detection model while keeping raw data local and secure. This paradigm is particularly relevant for cybersecurity companies (including both software and hardware vendors) that deploy ransomware detection or firewall systems across millions of endpoints. In such environments, data cannot be transferred outside the customer's device due to strict security, privacy, or regulatory constraints. Although FL applies broadly to malware threats, we validate the approach using the Ransomware Storage Access Patterns (RanSAP) dataset. Our experiments demonstrate that FL improves ransomware detection accuracy by a relative 9% over server-local models and achieves performance comparable to centralized training. These results indicate that FL offers a scalable, high-performing, and privacy-preserving framework for proactive ransomware detection across organizational and regulatory boundaries.

LGDec 2, 2024
Representation and Regression Problems in Neural Networks: Relaxation, Generalization, and Numerics

Kang Liu, Enrique Zuazua

In this work, we address three non-convex optimization problems associated with the training of shallow neural networks (NNs) for exact and approximate representation, as well as for regression tasks. Through a mean-field approach, we convexify these problems and, applying a representer theorem, prove the absence of relaxation gaps. We establish generalization bounds for the resulting NN solutions, assessing their predictive performance on test datasets and, analyzing the impact of key hyperparameters on these bounds, propose optimal choices. On the computational side, we examine the discretization of the convexified problems and derive convergence rates. For low-dimensional datasets, these discretized problems are efficiently solvable using the simplex method. For high-dimensional datasets, we propose a sparsification algorithm that, combined with gradient descent for over-parameterized shallow NNs, yields effective solutions to the primal problems.

OCDec 21, 2023
Cluster-based classification with neural ODEs via control

Antonio Álvarez-López, Rafael Orive-Illera, Enrique Zuazua

We address binary classification using neural ordinary differential equations from the perspective of simultaneous control of $N$ data points. We consider a single-neuron architecture with parameters fixed as piecewise constant functions of time. In this setting, the model complexity can be quantified by the number of control switches. Previous work has shown that classification can be achieved using a point-by-point strategy that requires $O(N)$ switches. We propose a new control method that classifies any arbitrary dataset by sequentially steering clusters of $d$ points, thereby reducing the complexity to $O(N/d)$ switches. The optimality of this result, particularly in high dimensions, is supported by some numerical experiments. Our complexity bound is sufficient but often conservative because same-class points tend to appear in larger clusters, simplifying classification. This motivates studying the probability distribution of the number of switches required. We introduce a simple control method that imposes a collinearity constraint on the parameters, and analyze a worst-case scenario where both classes have the same size and all points are i.i.d. Our results highlight the benefits of high-dimensional spaces, showing that classification using constant controls becomes more probable as $d$ increases.

LGFeb 4, 2025
Exact Sequence Interpolation with Transformers

Albert Alcalde, Giovanni Fantuzzi, Enrique Zuazua

We prove that transformers can exactly interpolate datasets of finite input sequences in $\mathbb{R}^d$, $d\geq 2$, with corresponding output sequences of smaller or equal length. Specifically, given $N$ sequences of arbitrary but finite lengths in $\mathbb{R}^d$ and output sequences of lengths $m^1, \dots, m^N \in \mathbb{N}$, we construct a transformer with $\mathcal{O}(\sum_{j=1}^N m^j)$ blocks and $\mathcal{O}(d \sum_{j=1}^N m^j)$ parameters that exactly interpolates the dataset. Our construction provides complexity estimates that are independent of the input sequence length, by alternating feed-forward and self-attention layers and by capitalizing on the clustering effect inherent to the latter. Our novel constructive method also uses low-rank parameter matrices in the self-attention mechanism, a common feature of practical transformer implementations. These results are first established in the hardmax self-attention setting, where the geometric structure permits an explicit and quantitative analysis, and are then extended to the softmax setting. Finally, we demonstrate the applicability of our exact interpolation construction to learning problems, in particular by providing convergence guarantees to a global minimizer under regularized training strategies. Our analysis contributes to the theoretical understanding of transformer models, offering an explanation for their excellent performance in exact sequence-to-sequence interpolation tasks.

LGNov 18, 2024
A Potential Game Perspective in Federated Learning

Kang Liu, Ziqi Wang, Enrique Zuazua

Federated learning (FL) is an emerging paradigm for training machine learning models across distributed clients. Traditionally, in FL settings, a central server assigns training efforts (or strategies) to clients. However, from a market-oriented perspective, clients may independently choose their training efforts based on rational self-interest. To explore this, we propose a potential game framework where each client's payoff is determined by their individual efforts and the rewards provided by the server. The rewards are influenced by the collective efforts of all clients and can be modulated through a reward factor. Our study begins by establishing the existence of Nash equilibria (NEs), followed by an investigation of uniqueness in homogeneous settings. We demonstrate a significant improvement in clients' training efforts at a critical reward factor, identifying it as the optimal choice for the server. Furthermore, we prove the convergence of the best-response algorithm to compute NEs for our FL game. Finally, we apply the training efforts derived from specific NEs to a real-world FL scenario, validating the effectiveness of the identified optimal reward factor.

LGOct 19, 2025
The Sherpa.ai Blind Vertical Federated Learning Paradigm to Minimize the Number of Communications

Alex Acero, Daniel M. Jimenez-Gutierrez, Dario Pighin et al.

Federated Learning (FL) enables collaborative decentralized training across multiple parties (nodes) while keeping raw data private. There are two main paradigms in FL: Horizontal FL (HFL), where all participant nodes share the same feature space but hold different samples, and Vertical FL (VFL), where participants hold complementary features for the same samples. While HFL is widely adopted, VFL is employed in domains where nodes hold complementary features about the same samples. Still, VFL presents a significant limitation: the vast number of communications required during training. This compromises privacy and security, and can lead to high energy consumption, and in some cases, make model training unfeasible due to the high number of communications. In this paper, we introduce Sherpa.ai Blind Vertical Federated Learning (SBVFL), a novel paradigm that leverages a distributed training mechanism enhanced for privacy and security. Decoupling the vast majority of node updates from the server dramatically reduces node-server communication. Experiments show that SBVFL reduces communication by ~99% compared to standard VFL while maintaining accuracy and robustness. Therefore, SBVFL enables practical, privacy-preserving VFL across sensitive domains, including healthcare, finance, manufacturing, aerospace, cybersecurity, and the defense industry.

LGOct 17, 2025
Deep Neural ODE Operator Networks for PDEs

Ziqian Li, Kang Liu, Yongcun Song et al.

Operator learning has emerged as a promising paradigm for developing efficient surrogate models to solve partial differential equations (PDEs). However, existing approaches often overlook the domain knowledge inherent in the underlying PDEs and hence suffer from challenges in capturing temporal dynamics and generalization issues beyond training time frames. This paper introduces a deep neural ordinary differential equation (ODE) operator network framework, termed NODE-ONet, to alleviate these limitations. The framework adopts an encoder-decoder architecture comprising three core components: an encoder that spatially discretizes input functions, a neural ODE capturing latent temporal dynamics, and a decoder reconstructing solutions in physical spaces. Theoretically, error analysis for the encoder-decoder architecture is investigated. Computationally, we propose novel physics-encoded neural ODEs to incorporate PDE-specific physical properties. Such well-designed neural ODEs significantly reduce the framework's complexity while enhancing numerical efficiency, robustness, applicability, and generalization capacity. Numerical experiments on nonlinear diffusion-reaction and Navier-Stokes equations demonstrate high accuracy, computational efficiency, and prediction capabilities beyond training time frames. Additionally, the framework's flexibility to accommodate diverse encoders/decoders and its ability to generalize across related PDE families further underscore its potential as a scalable, physics-encoded tool for scientific machine learning.

OCSep 30, 2025
Machine Learning and Control: Foundations, Advances, and Perspectives

Enrique Zuazua

Control theory of dynamical systems offers a powerful framework for tackling challenges in deep neural networks and other machine learning architectures. We show that concepts such as simultaneous and ensemble controllability offer new insights into the classification and representation properties of deep neural networks while the control and optimization of static systems can be employed to better understand the performance of shallow networks. Inspired by the classical concept of turnpike, we also explore the relationship between dynamic and static neural networks, where depth is traded for width, and the role of transformers as mechanisms for accelerating classical neural network tasks. We also exploit the expressive power of neural networks (exemplified, for instance, by the Universal Approximation Theorem) to develop a novel hybrid modeling methodology, the Hybrid-Cooperative Learning (HYCO), combining mechanics and data-driven methods in a game-theoretic setting. Finally, we describe how classical properties of diffusion processes, long established in the context of partial differential equations, contribute to explaining the success of modern generative artificial intelligence (AI). We present an overview of our recent results in these areas, illustrating how control, machine learning, numerical analysis, and partial differential equations come together to motivate a fertile ground for future research.

CLJun 26, 2024
Clustering in pure-attention hardmax transformers and its role in sentiment analysis

Albert Alcalde, Giovanni Fantuzzi, Enrique Zuazua

Transformers are extremely successful machine learning models whose mathematical properties remain poorly understood. Here, we rigorously characterize the behavior of transformers with hardmax self-attention and normalization sublayers as the number of layers tends to infinity. By viewing such transformers as discrete-time dynamical systems describing the evolution of points in a Euclidean space, and thanks to a geometric interpretation of the self-attention mechanism based on hyperplane separation, we show that the transformer inputs asymptotically converge to a clustered equilibrium determined by special points called leaders. We then leverage this theoretical understanding to solve sentiment analysis problems from language processing using a fully interpretable transformer model, which effectively captures `context' by clustering meaningless words around leader words carrying the most meaning. Finally, we outline remaining challenges to bridge the gap between the mathematical analysis of transformers and their real-life implementation.

OCJan 18, 2024
Interplay between depth and width for interpolation in neural ODEs

Antonio Álvarez-López, Arselane Hadj Slimane, Enrique Zuazua

Neural ordinary differential equations (neural ODEs) have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of their optimal architecture remains elusive. In this work, we examine the interplay between their width $p$ and number of layer transitions $L$ (effectively the depth $L+1$). Specifically, we assess the model expressivity in terms of its capacity to interpolate either a finite dataset $D$ comprising $N$ pairs of points or two probability measures in $\mathbb{R}^d$ within a Wasserstein error margin $\varepsilon>0$. Our findings reveal a balancing trade-off between $p$ and $L$, with $L$ scaling as $O(1+N/p)$ for dataset interpolation, and $L=O\left(1+(p\varepsilon^d)^{-1}\right)$ for measure interpolation. In the autonomous case, where $L=0$, a separate study is required, which we undertake focusing on dataset interpolation. We address the relaxed problem of $\varepsilon$-approximate controllability and establish an error decay of $\varepsilon\sim O(\log(p)p^{-1/d})$. This decay rate is a consequence of applying a universal approximation theorem to a custom-built Lipschitz vector field that interpolates $D$. In the high-dimensional setting, we further demonstrate that $p=O(N)$ neurons are likely sufficient to achieve exact control.

OCFeb 8, 2022
Turnpike in optimal control of PDEs, ResNets, and beyond

Borjan Geshkovski, Enrique Zuazua

The \emph{turnpike property} in contemporary macroeconomics asserts that if an economic planner seeks to move an economy from one level of capital to another, then the most efficient path, as long as the planner has enough time, is to rapidly move stock to a level close to the optimal stationary or constant path, then allow for capital to develop along that path until the desired term is nearly reached, at which point the stock ought to be moved to the final target. Motivated in part by its nature as a resource allocation strategy, over the past decade, the turnpike property has also been shown to hold for several classes of partial differential equations arising in mechanics. When formalized mathematically, the turnpike theory corroborates the insights from economics: for an optimal control problem set in a finite-time horizon, optimal controls and corresponding states, are close (often exponentially), during most of the time, except near the initial and final time, to the optimal control and corresponding state for the associated stationary optimal control problem. In particular, the former are mostly constant over time. This fact provides a rigorous meaning to the asymptotic simplification that some optimal control problems appear to enjoy over long time intervals, allowing the consideration of the corresponding stationary problem for computing and applications. We review a slice of the theory developed over the past decade --the controllability of the underlying system is an important ingredient, and can even be used to devise simple turnpike-like strategies which are nearly optimal--, and present several novel applications, including, among many others, the characterization of Hamilton-Jacobi-Bellman asymptotics, and stability estimates in deep learning via residual neural networks.

OCAug 6, 2020
Large-time asymptotics in deep learning

Carlos Esteve, Borjan Geshkovski, Dario Pighin et al.

We consider the neural ODE perspective of supervised learning and study the impact of the final time $T$ (which may indicate the depth of a corresponding ResNet) in training. For the classical $L^2$--regularized empirical risk minimization problem, whenever the neural ODE dynamics are homogeneous with respect to the parameters, we show that the training error is at most of the order $\mathcal{O}\left(\frac{1}{T}\right)$. Furthermore, if the loss inducing the empirical risk attains its minimum, the optimal parameters converge to minimal $L^2$--norm parameters which interpolate the dataset. By a natural scaling between $T$ and the regularization hyperparameter $λ$ we obtain the same results when $λ\searrow0$ and $T$ is fixed. This allows us to stipulate generalization properties in the overparametrized regime, now seen from the large depth, neural ODE perspective. To enhance the polynomial decay, inspired by turnpike theory in optimal control, we propose a learning problem with an additional integral regularization term of the neural ODE trajectory over $[0,T]$. In the setting of $\ell^p$--distance losses, we prove that both the training error and the optimal parameters are at most of the order $\mathcal{O}\left(e^{-μt}\right)$ in any $t\in[0,T]$. The aforementioned stability estimates are also shown for continuous space-time neural networks, taking the form of nonlinear integro-differential equations. By using a time-dependent moving grid for discretizing the spatial variable, we demonstrate that these equations provide a framework for addressing ResNets with variable widths.