Zakhar Shumaylov

LG
h-index55
16papers
755citations
Novelty52%
AI Score58

16 Papers

LGOct 9, 2023
Provably Convergent Data-Driven Convex-Nonconvex Regularization

Zakhar Shumaylov, Jeremy Budd, Subhadip Mukherjee et al.

An emerging new paradigm for solving inverse problems is via the use of deep learning to learn a regularizer from data. This leads to high-quality results, but often at the cost of provable guarantees. In this work, we show how well-posedness and convergent regularization arises within the convex-nonconvex (CNC) framework for inverse problems. We introduce a novel input weakly convex neural network (IWCNN) construction to adapt the method of learned adversarial regularization to the CNC framework. Empirically we show that our method overcomes numerical issues of previous adversarial methods.

DSNov 12, 2025
When is a System Discoverable from Data? Discovery Requires Chaos

Zakhar Shumaylov, Peter Zaika, Philipp Scholl et al.

The deep learning revolution has spurred a rise in advances of using AI in sciences. Within physical sciences the main focus has been on discovery of dynamical systems from observational data. Yet the reliability of learned surrogates and symbolic models is often undermined by the fundamental problem of non-uniqueness. The resulting models may fit the available data perfectly, but lack genuine predictive power. This raises the question: under what conditions can the systems governing equations be uniquely identified from a finite set of observations? We show, counter-intuitively, that chaos, typically associated with unpredictability, is crucial for ensuring a system is discoverable in the space of continuous or analytic functions. The prevalence of chaotic systems in benchmark datasets may have inadvertently obscured this fundamental limitation. More concretely, we show that systems chaotic on their entire domain are discoverable from a single trajectory within the space of continuous functions, and systems chaotic on a strange attractor are analytically discoverable under a geometric condition on the attractor. As a consequence, we demonstrate for the first time that the classical Lorenz system is analytically discoverable. Moreover, we establish that analytic discoverability is impossible in the presence of first integrals, common in real-world systems. These findings help explain the success of data-driven methods in inherently chaotic domains like weather forecasting, while revealing a significant challenge for engineering applications like digital twins, where stable, predictable behavior is desired. For these non-chaotic systems, we find that while trajectory data alone is insufficient, certain prior physical knowledge can help ensure discoverability. These findings warrant a critical re-evaluation of the fundamental assumptions underpinning purely data-driven discovery.

LGFeb 6
Diffeomorphism-Equivariant Neural Networks

Josephine Elisabeth Oettinger, Zakhar Shumaylov, Johannes Bostelmann et al.

Incorporating group symmetries via equivariance into neural networks has emerged as a robust approach for overcoming the efficiency and data demands of modern deep learning. While most existing approaches, such as group convolutions and averaging-based methods, focus on compact, finite, or low-dimensional groups with linear actions, this work explores how equivariance can be extended to infinite-dimensional groups. We propose a strategy designed to induce diffeomorphism equivariance in pre-trained neural networks via energy-based canonicalisation. Formulating equivariance as an optimisation problem allows us to access the rich toolbox of already established differentiable image registration methods. Empirical results on segmentation and classification tasks confirm that our approach achieves approximate equivariance and generalises to unseen transformations without relying on extensive data augmentation or retraining.

LGMay 11
Muon is Not That Special: Random or Inverted Spectra Work Just as Well

Zakhar Shumaylov, Nathaël Da Costa, Peter Zaika et al.

The recent empirical success of the Muon optimizer has renewed interest in non-Euclidean optimization, typically justified by similarities with second-order methods, and linear minimization oracle (LMO) theory. In this paper, we challenge this geometric narrative through three contributions, demonstrating that precise geometric structure is not the key factor affecting optimization performance. First, we introduce Freon, a family of optimizers based on Schatten (quasi-)norms, powered by a novel, provably optimal QDWH-based iterative approximation. Freon naturally interpolates between SGD and Muon, while smoothly extrapolating into the quasi-norm regime. Empirically, the best-performing Schatten parameters for GPT-2 lie strictly within the quasi-norm regime, and thus cannot be represented by any unitarily invariant LMO. Second, noting that Freon performs well across a wide range of exponents, we introduce Kaon, an absurd optimizer that replaces singular values with random noise. Despite lacking any coherent geometric structure, Kaon matches Muon's performance and retains classical convergence guarantees, proving that strict adherence to a precise geometry is practically irrelevant. Third, having shown that geometry is not the primary driver of performance, we demonstrate it is instead controlled by two local quantities: alignment and descent potential. Ultimately, each optimizer must tune its step size around these two quantities. While their dynamics are difficult to predict a-priori, evaluating them within a stochastic random feature model yields a precise insight: Muon succeeds not by tracking an ideal global geometry, but by guaranteeing step-size optimality.

IVDec 11, 2024Code
Benchmarking learned algorithms for computed tomography image reconstruction tasks

Maximilian B. Kiss, Ander Biguri, Zakhar Shumaylov et al.

Computed tomography (CT) is a widely used non-invasive diagnostic method in various fields, and recent advances in deep learning have led to significant progress in CT image reconstruction. However, the lack of large-scale, open-access datasets has hindered the comparison of different types of learned methods. To address this gap, we use the 2DeteCT dataset, a real-world experimental computed tomography dataset, for benchmarking machine learning based CT image reconstruction algorithms. We categorize these methods into post-processing networks, learned/unrolled iterative methods, learned regularizer methods, and plug-and-play methods, and provide a pipeline for easy implementation and evaluation. Using key performance metrics, including SSIM and PSNR, our benchmarking results showcase the effectiveness of various algorithms on tasks such as full data reconstruction, limited-angle reconstruction, sparse-angle reconstruction, low-dose reconstruction, and beam-hardening corrected reconstruction. With this benchmarking study, we provide an evaluation of a range of algorithms representative for different categories of learned reconstruction methods on a recently published dataset of real-world experimental CT measurements. The reproducible setup of methods and CT image reconstruction tasks in an open-source toolbox enables straightforward addition and comparison of new methods later on. The toolbox also provides the option to load the 2DeteCT dataset differently for extensions to other problems and different CT reconstruction tasks.

CEMay 7
Adaptive Coordinate Transforms for Neural Operators

Chaoyu Liu, Zhonghao Li, Gaohang Chen et al.

Neural operators have achieved promising performance on partial differential equations (PDEs), but most existing models are built on fixed Eulerian coordinates. This mismatch between evolving physical structures and static coordinates creates spatial misalignment, leading to unnecessarily non-local operator mappings and reinforcing a smoothness preference near sharp transitions. Inspired by adaptive coordinate transformations in classical PDE analysis, we propose the Adaptive Coordinate Transform (ACT) block, a plug-and-play module for data-driven geometric adaptation in neural operators. ACT blocks resolve this structural limitation by learning adaptive coordinate systems within the operator learning pipeline. Specifically, given an input feature, the ACT block learns a coordinate transformation and represents the same feature under the transformed coordinates via differentiable sampling. This operation preserves the underlying signal while changing its spatial representation, equivalent to expressing the same physical quantity in different coordinate systems. By adapting the coordinate system to the data, ACT allows the network to better track evolving structures, reduce operator complexity, and dynamically focus on critical features to improve learning. We evaluate the proposed approach across diverse PDE benchmarks and multiple neural operator architectures. Experimental results demonstrate consistent and significant improvements in predictive accuracy, indicating that learning coordinate systems provides a powerful mechanism for enhancing operator learning.

OCFeb 1, 2024
Weakly Convex Regularisers for Inverse Problems: Convergence of Critical Points and Primal-Dual Optimisation

Zakhar Shumaylov, Jeremy Budd, Subhadip Mukherjee et al.

Variational regularisation is the primary method for solving inverse problems, and recently there has been considerable work leveraging deeply learned regularisation for enhanced performance. However, few results exist addressing the convergence of such regularisation, particularly within the context of critical points as opposed to global minimisers. In this paper, we present a generalised formulation of convergent regularisation in terms of critical points, and show that this is achieved by a class of weakly convex regularisers. We prove convergence of the primal-dual hybrid gradient method for the associated variational problem, and, given a Kurdyka-Lojasiewicz condition, an $\mathcal{O}(\log{k}/k)$ ergodic convergence rate. Finally, applying this theory to learned regularisation, we prove universal approximation for input weakly convex neural networks (IWCNN), and show empirically that IWCNNs can lead to improved performance of learned adversarial regularisers for computed tomography (CT) reconstruction.

LGDec 21, 2024
Symplectic Neural Flows for Modeling and Discovery

Priscilla Canizares, Davide Murari, Carola-Bibiane Schönlieb et al.

Hamilton's equations are fundamental for modeling complex physical systems, where preserving key properties such as energy and momentum is crucial for reliable long-term simulations. Geometric integrators are widely used for this purpose, but neural network-based methods that incorporate these principles remain underexplored. This work introduces SympFlow, a time-dependent symplectic neural network designed using parameterized Hamiltonian flow maps. This design allows for backward error analysis and ensures the preservation of the symplectic structure. SympFlow allows for two key applications: (i) providing a time-continuous symplectic approximation of the exact flow of a Hamiltonian system--purely based on the differential equations it satisfies, and (ii) approximating the flow map of an unknown Hamiltonian system relying on trajectory data. We demonstrate the effectiveness of SympFlow on diverse problems, including chaotic and dissipative systems, showing improved energy conservation compared to general-purpose numerical methods and accurate

LGOct 2, 2025
Learning Regularization Functionals for Inverse Problems: A Comparative Study

Johannes Hertrich, Hok Shing Wong, Alexander Denker et al.

In recent years, a variety of learned regularization frameworks for solving inverse problems in imaging have emerged. These offer flexible modeling together with mathematical insights. The proposed methods differ in their architectural design and training strategies, making direct comparison challenging due to non-modular implementations. We address this gap by collecting and unifying the available code into a common framework. This unified view allows us to systematically compare the approaches and highlight their strengths and limitations, providing valuable insights into their future potential. We also provide concise descriptions of each method, complemented by practical guidelines.

LGFeb 1, 2025
Generalized Lie Symmetries in Physics-Informed Neural Operators

Amy Xiang Wang, Zakhar Shumaylov, Peter Zaika et al.

Physics-informed neural operators (PINOs) have emerged as powerful tools for learning solution operators of partial differential equations (PDEs). Recent research has demonstrated that incorporating Lie point symmetry information can significantly enhance the training efficiency of PINOs, primarily through techniques like data, architecture, and loss augmentation. In this work, we focus on the latter, highlighting that point symmetries oftentimes result in no training signal, limiting their effectiveness in many problems. To address this, we propose a novel loss augmentation strategy that leverages evolutionary representatives of point symmetries, a specific class of generalized symmetries of the underlying PDE. These generalized symmetries provide a richer set of generators compared to standard symmetries, leading to a more informative training signal. We demonstrate that leveraging evolutionary representatives enhances the performance of neural operators, resulting in improved data efficiency and accuracy during training.

LGOct 23, 2024
Hamiltonian Matching for Symplectic Neural Integrators

Priscilla Canizares, Davide Murari, Carola-Bibiane Schönlieb et al.

Hamilton's equations of motion form a fundamental framework in various branches of physics, including astronomy, quantum mechanics, particle physics, and climate science. Classical numerical solvers are typically employed to compute the time evolution of these systems. However, when the system spans multiple spatial and temporal scales numerical errors can accumulate, leading to reduced accuracy. To address the challenges of evolving such systems over long timescales, we propose SympFlow, a novel neural network-based symplectic integrator, which is the composition of a sequence of exact flow maps of parametrised time-dependent Hamiltonian functions. This architecture allows for a backward error analysis: we can identify an underlying Hamiltonian function of the architecture and use it to define a Hamiltonian matching objective function, which we use for training. In numerical experiments, we show that SympFlow exhibits promising results, with qualitative energy conservation behaviour similar to that of time-stepping symplectic integrators.

LGJul 12, 2025
On Information Geometry and Iterative Optimization in Model Compression: Operator Factorization

Zakhar Shumaylov, Vasileios Tsiaras, Yannis Stylianou

The ever-increasing parameter counts of deep learning models necessitate effective compression techniques for deployment on resource-constrained devices. This paper explores the application of information geometry, the study of density-induced metrics on parameter spaces, to analyze existing methods within the space of model compression, primarily focusing on operator factorization. Adopting this perspective highlights the core challenge: defining an optimal low-compute submanifold (or subset) and projecting onto it. We argue that many successful model compression approaches can be understood as implicitly approximating information divergences for this projection. We highlight that when compressing a pre-trained model, using information divergences is paramount for achieving improved zero-shot accuracy, yet this may no longer be the case when the model is fine-tuned. In such scenarios, trainability of bottlenecked models turns out to be far more important for achieving high compression ratios with minimal performance degradation, necessitating adoption of iterative methods. In this context, we prove convergence of iterative singular value thresholding for training neural networks subject to a soft rank constraint. To further illustrate the utility of this perspective, we showcase how simple modifications to existing methods through softer rank reduction result in improved performance under fixed compression rates.

NAJun 13, 2025
Data-driven approaches to inverse problems

Carola-Bibiane Schönlieb, Zakhar Shumaylov

Inverse problems are concerned with the reconstruction of unknown physical quantities using indirect measurements and are fundamental across diverse fields such as medical imaging, remote sensing, and material sciences. These problems serve as critical tools for visualizing internal structures beyond what is visible to the naked eye, enabling quantification, diagnosis, prediction, and discovery. However, most inverse problems are ill-posed, necessitating robust mathematical treatment to yield meaningful solutions. While classical approaches provide mathematically rigorous and computationally stable solutions, they are constrained by the ability to accurately model solution properties and implement them efficiently. A more recent paradigm considers deriving solutions to inverse problems in a data-driven manner. Instead of relying on classical mathematical modeling, this approach utilizes highly over-parameterized models, typically deep neural networks, which are adapted to specific inverse problems using carefully selected training data. Current approaches that follow this new paradigm distinguish themselves through solution accuracy paired with computational efficiency that was previously inconceivable. These notes offer an introduction to this data-driven paradigm for inverse problems. The first part of these notes will provide an introduction to inverse problems, discuss classical solution strategies, and present some applications. The second part will delve into modern data-driven approaches, with a particular focus on adversarial regularization and provably convergent linear plug-and-play denoisers. Throughout the presentation of these methodologies, their theoretical properties will be discussed, and numerical examples will be provided. The lecture series will conclude with a discussion of open problems and future perspectives in the field.

LGMay 27, 2023
The Curse of Recursion: Training on Generated Data Makes Models Forget

Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao et al.

Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3(.5) and GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT introduced such language models to the general public. It is now clear that large language models (LLMs) are here to stay, and will bring about drastic change in the whole ecosystem of online text and images. In this paper we consider what the future might hold. What will happen to GPT-{n} once LLMs contribute much of the language found online? We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear. We refer to this effect as Model Collapse and show that it can occur in Variational Autoencoders, Gaussian Mixture Models and LLMs. We build theoretical intuition behind the phenomenon and portray its ubiquity amongst all learned generative models. We demonstrate that it has to be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web. Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of content generated by LLMs in data crawled from the Internet.

LGApr 19, 2021
Manipulating SGD with Data Ordering Attacks

Ilia Shumailov, Zakhar Shumaylov, Dmitry Kazhdan et al.

Machine learning is vulnerable to a wide variety of attacks. It is now well understood that by changing the underlying data distribution, an adversary can poison the model trained with it or introduce backdoors. In this paper we present a novel class of training-time attacks that require no changes to the underlying dataset or model architecture, but instead only change the order in which data are supplied to the model. In particular, we find that the attacker can either prevent the model from learning, or poison it to learn behaviours specified by the attacker. Furthermore, we find that even a single adversarially-ordered epoch can be enough to slow down model learning, or even to reset all of the learning progress. Indeed, the attacks presented here are not specific to the model or dataset, but rather target the stochastic nature of modern learning procedures. We extensively evaluate our attacks on computer vision and natural language benchmarks to find that the adversary can disrupt model training and even introduce backdoors.

LGAug 6, 2020
Learned convex regularizers for inverse problems

Subhadip Mukherjee, Sören Dittmer, Zakhar Shumaylov et al.

We consider the variational reconstruction framework for inverse problems and propose to learn a data-adaptive input-convex neural network (ICNN) as the regularization functional. The ICNN-based convex regularizer is trained adversarially to discern ground-truth images from unregularized reconstructions. Convexity of the regularizer is desirable since (i) one can establish analytical convergence guarantees for the corresponding variational reconstruction problem and (ii) devise efficient and provable algorithms for reconstruction. In particular, we show that the optimal solution to the variational problem converges to the ground-truth if the penalty parameter decays sub-linearly with respect to the norm of the noise. Further, we prove the existence of a sub-gradient-based algorithm that leads to a monotonically decreasing error in the parameter space with iterations. To demonstrate the performance of our approach for solving inverse problems, we consider the tasks of deblurring natural images and reconstructing images in computed tomography (CT), and show that the proposed convex regularizer is at least competitive with and sometimes superior to state-of-the-art data-driven techniques for inverse problems.