LGJun 28, 2022
Neural Integro-Differential EquationsEmanuele Zappala, Antonio Henrique de Oliveira Fonseca, Andrew Henry Moberly et al.
Modeling continuous dynamical systems from discretely sampled observations is a fundamental problem in data science. Often, such dynamics are the result of non-local processes that present an integral over time. As such, these systems are modeled with Integro-Differential Equations (IDEs); generalizations of differential equations that comprise both an integral and a differential component. For example, brain dynamics are not accurately modeled by differential equations since their behavior is non-Markovian, i.e. dynamics are in part dictated by history. Here, we introduce the Neural IDE (NIDE), a novel deep learning framework based on the theory of IDEs where integral operators are learned using neural networks. We test NIDE on several toy and brain activity datasets and demonstrate that NIDE outperforms other models. These tasks include time extrapolation as well as predicting dynamics from unseen initial conditions, which we test on whole-cortex activity recordings in freely behaving mice. Further, we show that NIDE can decompose dynamics into their Markovian and non-Markovian constituents via the learned integral operator, which we test on fMRI brain activity recordings of people on ketamine. Finally, the integrand of the integral operator provides a latent space that gives insight into the underlying dynamics, which we demonstrate on wide-field brain imaging recordings. Altogether, NIDE is a novel approach that enables modeling of complex non-local dynamics with neural networks.
LGSep 30, 2022
Neural Integral EquationsEmanuele Zappala, Antonio Henrique de Oliveira Fonseca, Josue Ortega Caro et al.
Nonlinear operators with long distance spatiotemporal dependencies are fundamental in modeling complex systems across sciences, yet learning these nonlocal operators remains challenging in machine learning. Integral equations (IEs), which model such nonlocal systems, have wide ranging applications in physics, chemistry, biology, and engineering. We introduce Neural Integral Equations (NIE), a method for learning unknown integral operators from data using an IE solver. To improve scalability and model capacity, we also present Attentional Neural Integral Equations (ANIE), which replaces the integral with self-attention. Both models are grounded in the theory of second kind integral equations, where the indeterminate appears both inside and outside the integral operator. We provide theoretical analysis showing how self-attention can approximate integral operators under mild regularity assumptions, further deepening previously reported connections between transformers and integration, and deriving corresponding approximation results for integral operators. Through numerical benchmarks on synthetic and real world data, including Lotka-Volterra, Navier-Stokes, and Burgers' equations, as well as brain dynamics and integral equations, we showcase the models' capabilities and their ability to derive interpretable dynamics embeddings. Our experiments demonstrate that ANIE outperforms existing methods, especially for longer time intervals and higher dimensional problems. Our work addresses a critical gap in machine learning for nonlocal operators and offers a powerful tool for studying unknown complex systems with long range dependencies.
15.1LGMay 19Code
Nonlocal operator learning for fMRI encoding and decoding tasksAndreas Kramer, Saugat Acharya, Alice Giola et al.
Functional MRI data exhibit high-dimensional spatiotemporal structure, making both prediction and decoding challenging. In this work, we investigate neural integral-operator-based models for encoding and decoding tasks in fMRI, with particular emphasis on the role of nonlocal spatiotemporal context. We implement a latent neural integral operator framework that performs fixed point iterations in an auxiliary space from which classification and stimuli prediction is performed via a decoder. We evaluate our model on two open-source fMRI datasets. Our experiments examine both decoding of stimuli from fMRI recordings and encoding of fMRI dynamics from stimulus representations. A main focus is the effect of spatiotemporal context: we systematically compare short and long temporal windows, as well as the use of visual cortex vs whole brain recordings, and analyze their influence on performance and latent-space geometry. Across tasks and datasets, larger temporal windows generally improve results and produce more structured learned representations. In decoding experiments, the learned latent space often provides clearer class separation than the raw data. In encoding experiments, although absolute performance remains moderate due to the difficulty of the task, longer temporal windows still yield consistent gains. These findings suggest that neural integral operators provide a promising framework for modeling fMRI dynamics and that broader spatiotemporal context can be beneficial for both prediction and representation learning. More broadly, the results indicate that exploiting distributed nonlocal structure in brain dynamics requires model architectures specifically designed to capture such dependencies.
LGOct 17, 2022
FIMP: Foundation Model-Informed Message Passing for Graph Neural NetworksSyed Asad Rizvi, Nazreen Pallikkavaliyaveetil, David Zhang et al.
Foundation models have achieved remarkable success across many domains, relying on pretraining over vast amounts of data. Graph-structured data often lacks the same scale as unstructured data, making the development of graph foundation models challenging. In this work, we propose Foundation-Informed Message Passing (FIMP), a Graph Neural Network (GNN) message-passing framework that leverages pretrained non-textual foundation models in graph-based tasks. We show that the self-attention layers of foundation models can effectively be repurposed on graphs to perform cross-node attention-based message-passing. Our model is evaluated on a real-world image network dataset and two biological applications (single-cell RNA sequencing data and fMRI brain activity recordings) in both finetuned and zero-shot settings. FIMP outperforms strong baselines, demonstrating that it can effectively leverage state-of-the-art foundation models in graph tasks.
QUANT-PHOct 25, 2022
Deep Neural Networks as the Semi-classical Limit of Topological Quantum Neural Networks: The problem of generalisationAntonino Marciano, Emanuele Zappala, Tommaso Torda et al.
Deep Neural Networks miss a principled model of their operation. A novel framework for supervised learning based on Topological Quantum Field Theory that looks particularly well suited for implementation on quantum processors has been recently explored. We propose using this framework to understand the problem of generalisation in Deep Neural Networks. More specifically, in this approach, Deep Neural Networks are viewed as the semi-classical limit of Topological Quantum Neural Networks. A framework of this kind explains the overfitting behavior of Deep Neural Networks during the training step and the corresponding generalisation capabilities. We explore the paradigmatic case of the perceptron, which we implement as the semiclassical limit of Topological Quantum Neural Networks. We apply a novel algorithm we developed, showing that it obtains similar results to standard neural networks, but without the need for training (optimisation).
LGJan 31, 2023
Continuous Spatiotemporal TransformersAntonio H. de O. Fonseca, Emanuele Zappala, Josue Ortega Caro et al.
Modeling spatiotemporal dynamical systems is a fundamental challenge in machine learning. Transformer models have been very successful in NLP and computer vision where they provide interpretable representations of data. However, a limitation of transformers in modeling continuous dynamical systems is that they are fundamentally discrete time and space models and thus have no guarantees regarding continuous sampling. To address this challenge, we present the Continuous Spatiotemporal Transformer (CST), a new transformer architecture that is designed for the modeling of continuous systems. This new framework guarantees a continuous and smooth output via optimization in Sobolev space. We benchmark CST against traditional transformers as well as other spatiotemporal dynamics modeling methods and achieve superior performance in a number of tasks on synthetic and real systems, including learning brain dynamics from calcium imaging data.
LGSep 1, 2024
Universal Approximation of Operators with Transformers and Neural Integral OperatorsEmanuele Zappala, Maryam Bagherian
We study the universal approximation properties of transformers and neural integral operators for operators in Banach spaces. In particular, we show that the transformer architecture is a universal approximator of integral operators between Hölder spaces. Moreover, we show that a generalized version of neural integral operators, based on the Gavurin integral, are universal approximators of arbitrary operators between Banach spaces. Lastly, we show that a modified version of transformer, which uses Leray-Schauder mappings, is a universal approximator of operators between arbitrary Banach spaces.
LGOct 2, 2023
Operator Learning Meets Numerical Analysis: Improving Neural Networks through Iterative MethodsEmanuele Zappala, Daniel Levine, Sizhuang He et al.
Deep neural networks, despite their success in numerous applications, often function without established theoretical foundations. In this paper, we bridge this gap by drawing parallels between deep learning and classical numerical analysis. By framing neural networks as operators with fixed points representing desired solutions, we develop a theoretical framework grounded in iterative methods for operator equations. Under defined conditions, we present convergence proofs based on fixed point theory. We demonstrate that popular architectures, such as diffusion models and AlphaFold, inherently employ iterative operator learning. Empirical assessments highlight that performing iterations through network operators improves performance. We also introduce an iterative graph neural network, PIGN, that further demonstrates benefits of iterations. Our work aims to enhance the understanding of deep learning by merging insights from numerical analysis, potentially guiding the design of future networks with clearer theoretical underpinnings and improved performance.
NADec 9, 2023
Spectral methods for Neural Integral EquationsEmanuele Zappala
Neural integral equations are deep learning models based on the theory of integral equations, where the model consists of an integral operator and the corresponding equation (of the second kind) which is learned through an optimization procedure. This approach allows to leverage the nonlocal properties of integral operators in machine learning, but it is computationally expensive. In this article, we introduce a framework for neural integral equations based on spectral methods that allows us to learn an operator in the spectral domain, resulting in a cheaper computational cost, as well as in high interpolation accuracy. We study the properties of our methods and show various theoretical guarantees regarding the approximation capabilities of the model, and convergence to solutions of the numerical methods. We provide numerical experiments to demonstrate the practical effectiveness of the resulting model.
LGFeb 13, 2025
Non-Markovian Discrete Diffusion with Causal Language ModelsYangtian Zhang, Sizhuang He, Daniel Levine et al.
Discrete diffusion models offer a flexible, controllable approach to structured sequence generation, yet they still lag behind causal language models in expressive power. A key limitation lies in their reliance on the Markovian assumption, which restricts each step to condition only on the current state, leading to potential uncorrectable error accumulation. In this paper, we introduce CaDDi (Causal Discrete Diffusion Model), a discrete diffusion model that conditions on the entire generative trajectory, thereby lifting the Markov constraint and allowing the model to revisit and improve past states. By unifying sequential (causal) and temporal (diffusion) reasoning in a single non-Markovian transformer, CaDDi also treats standard causal language models as a special case and permits the direct reuse of pretrained LLM weights with no architectural changes. Empirically, CaDDi outperforms state-of-the-art discrete diffusion baselines on natural-language benchmarks, substantially narrowing the remaining gap to large autoregressive transformers.
LGMay 6, 2025
Neural Integral Operators for Inverse problems in SpectroscopyEmanuele Zappala, Alice Giola, Andreas Kramer et al.
Deep learning has shown high performance on spectroscopic inverse problems when sufficient data is available. However, it is often the case that data in spectroscopy is scarce, and this usually causes severe overfitting problems with deep learning methods. Traditional machine learning methods are viable when datasets are smaller, but the accuracy and applicability of these methods is generally more limited. We introduce a deep learning method for classification of molecular spectra based on learning integral operators via integral equations of the first kind, which results in an algorithm that is less affected by overfitting issues on small datasets, compared to other deep learning models. The problem formulation of the deep learning approach is based on inverse problems, which have traditionally found important applications in spectroscopy. We perform experiments on real world data to showcase our algorithm. It is seen that the model outperforms traditional machine learning approaches such as decision tree and support vector machine, and for small datasets it outperforms other deep learning models. Therefore, our methodology leverages the power of deep learning, still maintaining the performance when the available data is very limited, which is one of the main issues that deep learning faces in spectroscopy, where datasets are often times of small size.
NAJun 18, 2024
Projection Methods for Operator Learning and Universal ApproximationEmanuele Zappala
We obtain a new universal approximation theorem for continuous (possibly nonlinear) operators on arbitrary Banach spaces using the Leray-Schauder mapping. Moreover, we introduce and study a method for operator learning in Banach spaces $L^p$ of functions with multiple variables, based on orthogonal projections on polynomial bases. We derive a universal approximation result for operators where we learn a linear projection and a finite dimensional mapping under some additional assumptions. For the case of $p=2$, we give some sufficient conditions for the approximation results to hold. This article serves as the theoretical framework for a deep learning methodology in operator learning.