MLJul 24, 2023
Extending Path-Dependent NJ-ODEs to Noisy Observations and a Dependent Observation FrameworkWilliam Andersson, Jakob Heiss, Florian Krach et al. · berkeley
The Path-Dependent Neural Jump Ordinary Differential Equation (PD-NJ-ODE) is a model for predicting continuous-time stochastic processes with irregular and incomplete observations. In particular, the method learns optimal forecasts given irregularly sampled time series of incomplete past observations. So far the process itself and the coordinate-wise observation times were assumed to be independent and observations were assumed to be noiseless. In this work we discuss two extensions to lift these restrictions and provide theoretical guarantees as well as empirical examples for them. In particular, we can lift the assumption of independence by extending the theory to much more realistic settings of conditional independence without any need to change the algorithm. Moreover, we introduce a new loss function, which allows us to deal with noisy observations and explain why the previously used loss function did not lead to a consistent estimator.
CPFeb 8, 2018
Deep HedgingHans Bühler, Lukas Gonon, Josef Teichmann et al.
We present a framework for hedging a portfolio of derivatives in the presence of market frictions such as transaction costs, market impact, liquidity constraints or risk limits using modern deep reinforcement machine learning methods. We discuss how standard reinforcement learning methods can be applied to non-linear reward structures, i.e. in our case convex risk measures. As a general contribution to the use of deep learning for stochastic processes, we also show that the set of constrained trading strategies used by our algorithm is large enough to $ε$-approximate any optimal solution. Our algorithm can be implemented efficiently even in high-dimensional situations using modern machine learning tools. Its structure does not depend on specific market dynamics, and generalizes across hedging instruments including the use of liquid derivatives. Its computational performance is largely invariant in the size of the portfolio as it depends mainly on the number of hedging instruments available. We illustrate our approach by showing the effect on hedging under transaction costs in a synthetic market driven by the Heston model, where we outperform the standard "complete market" solution.
LGMar 20, 2023
How (Implicit) Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part II: the Multi-D Case of Two Layers with Random First LayerJakob Heiss, Josef Teichmann, Hanna Wutte · berkeley
Randomized neural networks (randomized NNs), where only the terminal layer's weights are optimized constitute a powerful model class to reduce computational time in training the neural network model. At the same time, these models generalize surprisingly well in various regression and classification tasks. In this paper, we give an exact macroscopic characterization (i.e., a characterization in function space) of the generalization behavior of randomized, shallow NNs with ReLU activation (RSNs). We show that RSNs correspond to a generalized additive model (GAM)-typed regression in which infinitely many directions are considered: the infinite generalized additive model (IGAM). The IGAM is formalized as solution to an optimization problem in function space for a specific regularization functional and a fairly general loss. This work is an extension to multivariate NNs of prior work, where we showed how wide RSNs with ReLU activation behave like spline regression under certain conditions and if the input is one-dimensional.
PRMar 21, 2012
Polynomial processes and their applications to mathematical FinanceChrista Cuchiero, Martin Keller-Ressel, Josef Teichmann
We introduce a class of Markov processes, called $m$-polynomial, for which the calculation of (mixed) moments up to order $m$ only requires the computation of matrix exponentials. This class contains affine processes, processes with quadratic diffusion coefficients, as well as Lévy-driven SDEs with affine vector fields. Thus, many popular models such as exponential Lévy models or affine models are covered by this setting. The applications range from statistical GMM estimation procedures to new techniques for option pricing and hedging. For instance, the efficient and easy computation of moments can be used for variance reduction techniques in Monte Carlo methods.
PRJan 17, 2010
Jump-Diffusions in Hilbert Spaces: Existence, Stability and NumericsDamir Filipovic, Stefan Tappe, Josef Teichmann
By means of an original approach, called "method of the moving frame", we establish existence, uniqueness and stability results for mild and weak solutions of stochastic partial differential equations (SPDEs) with path dependent coefficients driven by an infinite dimensional Wiener process and a compensated Poisson random measure. Our approach is based on a time-dependent coordinate transform, which reduces a wide class of SPDEs to a class of simpler SDE problems. We try to present the most general results, which we can obtain in our setting, within a self-contained framework to demonstrate our approach in all details. Also several numerical approaches to SPDEs in the spirit of this setting are presented.
PRNov 11, 2010
A Semigroup Point Of View On Splitting Schemes For Stochastic (Partial) Differential EquationsPhilipp Doersek, Josef Teichmann
We construct normed spaces of real-valued functions with controlled growth on possibly infinite-dimensional state spaces such that semigroups of positive, bounded operators $(P_t)_{t\ge 0}$ thereon with $\lim_{t\to 0+}P_t f(x)=f(x)$ are in fact strongly continuous. This result applies to prove optimal rates of convergence of splitting schemes for stochastic (partial) differential equations with linearly growing characteristics and for sets of functions with controlled growth. Applications are general Da Prato-Zabczyk type equations and the HJM equations from interest rate theory.
MLJun 5, 2023
Global universal approximation of functional input maps on weighted spacesChrista Cuchiero, Philipp Schmocker, Josef Teichmann
We introduce so-called functional input neural networks defined on a possibly infinite dimensional weighted space with values also in a possibly infinite dimensional output space. To this end, we use an additive family to map the input weighted space to the hidden layer, on which a non-linear scalar activation function is applied to each neuron, and finally return the output via some linear readouts. Relying on Stone-Weierstrass theorems on weighted spaces, we can prove a global universal approximation result on weighted spaces for continuous functions going beyond the usual approximation on compact sets. This then applies in particular to approximation of (non-anticipative) path space functionals via functional input neural networks. As a further application of the weighted Stone-Weierstrass theorem we prove a global universal approximation result for linear functions of the signature. We also introduce the viewpoint of Gaussian process regression in this setting and emphasize that the reproducing kernel Hilbert space of the signature kernels are Cameron-Martin spaces of certain Gaussian processes. This paves a way towards uncertainty quantification for signature kernel regression.
PRDec 22, 2011
Efficient simulation and calibration of general HJM models by splitting schemesPhilipp Doersek, Josef Teichmann
We introduce efficient numerical methods for generic HJM equations of interest rate theory by means of high-order weak approximation schemes. These schemes allow for QMC implementations due to the relatively low dimensional integration space. The complexity of the resulting algorithm is considerably lower than the complexity of multi-level MC algorithms as long as the optimal order of QMC-convergence is guaranteed. In order to make the methods applicable to real world problems, we introduce and use the setting of weighted function spaces, such that unbounded payoffs and unbounded characteristics of the equations in question are still allowed. We also provide an implementation, where we efficiently calibrate an HJM equation to caplet data.
PRJan 19, 2012
Cubature Methods For Stochastic (Partial) Differential Equations In Weighted SpacesPhilipp Doersek, Josef Teichmann, Dejan Veluscek
The cubature on Wiener space method, a high-order weak approximation scheme, is established for SPDEs in the case of unbounded characteristics and unbounded payoffs. We first introduce a recently described flexible functional analytic framework, so called weighted spaces, where Feller-like properties hold. A refined analysis of vector fields on weighted spaces then yields optimal convergence rates of cubature methods for stochastic partial differential equations of Da Prato-Zabczyk type. The ubiquitous stability for the local approximation operator within the functional analytic setting is proved for SPDEs, however, in the infinite dimensional case we need a newly introduced assumption on weak symmetry of the cubature formula. In finite dimensions, we use the UFG condition to obtain optimal rates of convergence on non-uniform meshes for nonsmooth payoffs with exponential growth.
MLJun 28, 2022
Optimal Estimation of Generic Dynamics by Path-Dependent Neural Jump ODEsFlorian Krach, Marc Nübel, Josef Teichmann
This paper studies the problem of forecasting general stochastic processes using a path-dependent extension of the Neural Jump ODE (NJ-ODE) framework \citep{herrera2021neural}. While NJ-ODE was the first framework to establish convergence guarantees for the prediction of irregularly observed time series, these results were limited to data stemming from Itô-diffusions with complete observations, in particular Markov processes, where all coordinates are observed simultaneously. In this work, we generalise these results to generic, possibly non-Markovian or discontinuous, stochastic processes with incomplete observations, by utilising the reconstruction properties of the signature transform. These theoretical results are supported by empirical studies, where it is shown that the path-dependent NJ-ODE outperforms the original NJ-ODE framework in the case of non-Markovian data. Moreover, we show that PD-NJ-ODE can be applied successfully to classical stochastic filtering problems and to limit order book (LOB) data.
MLFeb 23
JUCAL: Jointly Calibrating Aleatoric and Epistemic Uncertainty in Classification TasksJakob Heiss, Sören Lambrecht, Jakob Weissteiner et al. · berkeley
We study post-calibration uncertainty for trained ensembles of classifiers. Specifically, we consider both aleatoric (label noise) and epistemic (model) uncertainty. Among the most popular and widely used calibration methods in classification are temperature scaling (i.e., pool-then-calibrate) and conformal methods. However, the main shortcoming of these calibration methods is that they do not balance the proportion of aleatoric and epistemic uncertainty. Not balancing these uncertainties can severely misrepresent predictive uncertainty, leading to overconfident predictions in some input regions while being underconfident in others. To address this shortcoming, we present a simple but powerful calibration algorithm Joint Uncertainty Calibration (JUCAL) that jointly calibrates aleatoric and epistemic uncertainty. JUCAL jointly calibrates two constants to weight and scale epistemic and aleatoric uncertainties by optimizing the negative log-likelihood (NLL) on the validation/calibration dataset. JUCAL can be applied to any trained ensemble of classifiers (e.g., transformers, CNNs, or tree-based methods), with minimal computational overhead, without requiring access to the models' internal parameters. We experimentally evaluate JUCAL on various text classification tasks, for ensembles of varying sizes and with different ensembling strategies. Our experiments show that JUCAL significantly outperforms SOTA calibration methods across all considered classification tasks, reducing NLL and predictive set size by up to 15% and 20%, respectively. Interestingly, even applying JUCAL to an ensemble of size 5 can outperform temperature-scaled ensembles of size up to 50 in terms of NLL and predictive set size, resulting in up to 10 times smaller inference costs. Thus, we propose JUCAL as a new go-to method for calibrating ensembles in classification.
MLJul 26, 2024
Learning Chaotic Systems and Long-Term Predictions with Neural Jump ODEsFlorian Krach, Josef Teichmann
The Path-dependent Neural Jump ODE (PD-NJ-ODE) is a model for online prediction of generic (possibly non-Markovian) stochastic processes with irregular (in time) and potentially incomplete (with respect to coordinates) observations. It is a model for which convergence to the $L^2$-optimal predictor, which is given by the conditional expectation, is established theoretically. Thereby, the training of the model is solely based on a dataset of realizations of the underlying stochastic process, without the need of knowledge of the law of the process. In the case where the underlying process is deterministic, the conditional expectation coincides with the process itself. Therefore, this framework can equivalently be used to learn the dynamics of ODE or PDE systems solely from realizations of the dynamical system with different initial conditions. We showcase the potential of our method by applying it to the chaotic system of a double pendulum. When training the standard PD-NJ-ODE method, we see that the prediction starts to diverge from the true path after about half of the evaluation time. In this work we enhance the model with two novel ideas, which independently of each other improve the performance of our modelling setup. The resulting dynamics match the true dynamics of the chaotic system very closely. The same enhancements can be used to provably enable the PD-NJ-ODE to learn long-term predictions for general stochastic datasets, where the standard model fails. This is verified in several experiments.
MASep 12, 2025Code
Tackling One Health Risks: How Large Language Models are leveraged for Risk Negotiation and Consensus-buildingAlexandra Fetsch, Iurii Savvateev, Racem Ben Romdhane et al.
Key global challenges of our times are characterized by complex interdependencies and can only be effectively addressed through an integrated, participatory effort. Conventional risk analysis frameworks often reduce complexity to ensure manageability, creating silos that hinder comprehensive solutions. A fundamental shift towards holistic strategies is essential to enable effective negotiations between different sectors and to balance the competing interests of stakeholders. However, achieving this balance is often hindered by limited time, vast amounts of information, and the complexity of integrating diverse perspectives. This study presents an AI-assisted negotiation framework that incorporates large language models (LLMs) and AI-based autonomous agents into a negotiation-centered risk analysis workflow. The framework enables stakeholders to simulate negotiations, systematically model dynamics, anticipate compromises, and evaluate solution impacts. By leveraging LLMs' semantic analysis capabilities we could mitigate information overload and augment decision-making process under time constraints. Proof-of-concept implementations were conducted in two real-world scenarios: (i) prudent use of a biopesticide, and (ii) targeted wild animal population control. Our work demonstrates the potential of AI-assisted negotiation to address the current lack of tools for cross-sectoral engagement. Importantly, the solution's open source, web based design, suits for application by a broader audience with limited resources and enables users to tailor and develop it for their own needs.
PMDec 27, 2023
Randomized Signature Methods in Optimal Portfolio SelectionErdinc Akyildirim, Matteo Gambara, Josef Teichmann et al.
We present convincing empirical results on the application of Randomized Signature Methods for non-linear, non-parametric drift estimation for a multi-variate financial market. Even though drift estimation is notoriously ill defined due to small signal to noise ratio, one can still try to learn optimal non-linear maps from data to future returns for the purposes of portfolio optimization. Randomized Signatures, in contrast to classical signatures, allow for high dimensional market dimension and provide features on the same scale. We do not contribute to the theory of Randomized Signatures here, but rather present our empirical findings on portfolio selection in real world settings including real market data and transaction costs.
CLSep 30, 2025
IMProofBench: Benchmarking AI on Research-Level Mathematical Proof GenerationJohannes Schmitt, Gergely Bérczi, Jasper Dekoninck et al.
As the mathematical capabilities of large language models (LLMs) improve, it becomes increasingly important to evaluate their performance on research-level tasks at the frontier of mathematical knowledge. However, existing benchmarks are limited, as they focus solely on final-answer questions or high-school competition problems. To address this gap, we introduce IMProofBench, a private benchmark consisting of 39 peer-reviewed problems developed by expert mathematicians. Each problem requires a detailed proof and is paired with subproblems that have final answers, supporting both an evaluation of mathematical reasoning capabilities by human experts and a large-scale quantitative analysis through automated grading. Furthermore, unlike prior benchmarks, the evaluation setup simulates a realistic research environment: models operate in an agentic framework with tools like web search for literature review and mathematical software such as SageMath. Our results show that current LLMs can succeed at the more accessible research-level questions, but still encounter significant difficulties on more challenging problems. Quantitatively, Grok-4 achieves the highest accuracy of 52% on final-answer subproblems, while GPT-5 obtains the best performance for proof generation, achieving a fully correct solution for 22% of problems. IMProofBench will continue to evolve as a dynamic benchmark in collaboration with the mathematical community, ensuring its relevance for evaluating the next generation of LLMs.
PRMar 20, 2025
Universal approximation property of neural stochastic differential equationsAnna P. Kwossek, David J. Prömel, Josef Teichmann
We identify various classes of neural networks that are able to approximate continuous functions locally uniformly subject to fixed global linear growth constraints. For such neural networks the associated neural stochastic differential equations can approximate general stochastic differential equations, both of Itô diffusion type, arbitrarily well. Moreover, quantitative error estimates are derived for stochastic differential equations with sufficiently regular coefficients.
CPMar 22, 2024
Robust Utility Optimization via a GAN ApproachFlorian Krach, Josef Teichmann, Hanna Wutte
Robust utility optimization enables an investor to deal with market uncertainty in a structured way, with the goal of maximizing the worst-case outcome. In this work, we propose a generative adversarial network (GAN) approach to (approximately) solve robust utility optimization problems in general and realistic settings. In particular, we model both the investor and the market by neural networks (NN) and train them in a mini-max zero-sum game. This approach is applicable for any continuous utility function and in realistic market settings with trading costs, where only observable information of the market can be used. A large empirical study shows the versatile usability of our method. Whenever an optimal reference strategy is available, our method performs on par with it and in the (many) settings without known optimal strategy, our method outperforms all other reference strategies. Moreover, we can conclude from our study that the trained path-dependent strategies do not outperform Markovian ones. Lastly, we uncover that our generative approach for learning optimal, (non-) robust investments under trading costs generates universally applicable alternatives to well known asymptotic strategies of idealized settings.
MLOct 3, 2025
Neural Jump ODEs as Generative ModelsRobert A. Crowell, Florian Krach, Josef Teichmann
In this work, we explore how Neural Jump ODEs (NJODEs) can be used as generative models for Itô processes. Given (discrete observations of) samples of a fixed underlying Itô process, the NJODE framework can be used to approximate the drift and diffusion coefficients of the process. Under standard regularity assumptions on the Itô processes, we prove that, in the limit, we recover the true parameters with our approximation. Hence, using these learned coefficients to sample from the corresponding Itô process generates, in the limit, samples with the same law as the true underlying process. Compared to other generative machine learning models, our approach has the advantage that it does not need adversarial training and can be trained solely as a predictive model on the observed samples without the need to generate any samples during training to empirically approximate the distribution. Moreover, the NJODE framework naturally deals with irregularly sampled data with missing values as well as with path-dependent dynamics, allowing to apply this approach in real-world settings. In particular, in the case of path-dependent coefficients of the Itô processes, the NJODE learns their optimal approximation given the past observations and therefore allows generating new paths conditionally on discrete, irregular, and incomplete past observations in an optimal way.
APSep 30, 2025
Revealing the temporal dynamics of antibiotic anomalies in the infant gut microbiome with neural jump ODEsAnja Adamov, Markus Chardonnet, Florian Krach et al. · berkeley, eth-zurich
Detecting anomalies in irregularly sampled multi-variate time-series is challenging, especially in data-scarce settings. Here we introduce an anomaly detection framework for irregularly sampled time-series that leverages neural jump ordinary differential equations (NJODEs). The method infers conditional mean and variance trajectories in a fully path dependent way and computes anomaly scores. On synthetic data containing jump, drift, diffusion, and noise anomalies, the framework accurately identifies diverse deviations. Applied to infant gut microbiome trajectories, it delineates the magnitude and persistence of antibiotic-induced disruptions: revealing prolonged anomalies after second antibiotic courses, extended duration treatments, and exposures during the second year of life. We further demonstrate the predictive capabilities of the inferred anomaly scores in accurately predicting antibiotic events and outperforming diversity-based baselines. Our approach accommodates unevenly spaced longitudinal observations, adjusts for static and dynamic covariates, and provides a foundation for inferring microbial anomalies induced by perturbations, offering a translational opportunity to optimize intervention regimens by minimizing microbial disruptions.
CAFeb 5, 2025
Signature Reconstruction from Randomized SignaturesMie Glückstad, Nicola Muca Cirone, Josef Teichmann
Controlled ordinary differential equations driven by continuous bounded variation curves can be considered a continuous time analogue of recurrent neural networks for the construction of expressive features of the input curves. We ask up to which extent well known signature features of such curves can be reconstructed from controlled ordinary differential equations with (untrained) random vector fields. The answer turns out to be algebraically involved, but essentially the number of signature features, which can be reconstructed from the non-linear flow of the controlled ordinary differential equation, is exponential in its hidden dimension, when the vector fields are chosen to be neural with depth two. Moreover, we characterize a general linear independence condition on arbitrary vector fields, under which the signature features up to some fixed order can always be reconstructed. Algebraically speaking this complements in a quantitative manner several well known results from the theory of Lie algebras of vector fields and puts them in a context of machine learning.
CPJan 7, 2022
Applications of Signature Methods to Market Anomaly DetectionErdinc Akyildirim, Matteo Gambara, Josef Teichmann et al.
Anomaly detection is the process of identifying abnormal instances or events in data sets which deviate from the norm significantly. In this study, we propose a signatures based machine learning algorithm to detect rare or unexpected items in a given data set of time series type. We present applications of signature or randomized signature as feature extractors for anomaly detection algorithms; additionally we provide an easy, representation theoretic justification for the construction of randomized signatures. Our first application is based on synthetic data and aims at distinguishing between real and fake trajectories of stock prices, which are indistinguishable by visual inspection. We also show a real life application by using transaction data from the cryptocurrency market. In this case, we are able to identify pump and dump attempts organized on social networks with F1 scores up to 88% by means of our unsupervised learning algorithm, thus achieving results that are close to the state-of-the-art in the field based on supervised learning.
LGJan 2, 2022
On the effectiveness of Randomized Signatures as Reservoir for Learning Rough DynamicsEnea Monzio Compagnoni, Anna Scampicchio, Luca Biggio et al.
Many finance, physics, and engineering phenomena are modeled by continuous-time dynamical systems driven by highly irregular (stochastic) inputs. A powerful tool to perform time series analysis in this context is rooted in rough path theory and leverages the so-called Signature Transform. This algorithm enjoys strong theoretical guarantees but is hard to scale to high-dimensional data. In this paper, we study a recently derived random projection variant called Randomized Signature, obtained using the Johnson-Lindenstrauss Lemma. We provide an in-depth experimental evaluation of the effectiveness of the Randomized Signature approach, in an attempt to showcase the advantages of this reservoir to the community. Specifically, we find that this method is preferable to the truncated Signature approach and alternative deep learning techniques in terms of model complexity, training time, accuracy, robustness, and data hungriness.
LGDec 31, 2021
How Infinitely Wide Neural Networks Can Benefit from Multi-task Learning -- an Exact Macroscopic CharacterizationJakob Heiss, Josef Teichmann, Hanna Wutte
In practice, multi-task learning (through learning features shared among tasks) is an essential property of deep neural networks (NNs). While infinite-width limits of NNs can provide good intuition for their generalization behavior, the well-known infinite-width limits of NNs in the literature (e.g., neural tangent kernels) assume specific settings in which wide ReLU-NNs behave like shallow Gaussian Processes with a fixed kernel. Consequently, in such settings, these NNs lose their ability to benefit from multi-task learning in the infinite-width limit. In contrast, we prove that optimizing wide ReLU neural networks with at least one hidden layer using L2-regularization on the parameters promotes multi-task learning due to representation-learning - also in the limiting regime where the network width tends to infinity. We present an exact quantitative characterization of this infinite width limit in an appropriate function space that neatly describes multi-task learning.
MLApr 28, 2021
Optimal Stopping via Randomized Neural NetworksCalypso Herrera, Florian Krach, Pierre Ruyssen et al.
This paper presents the benefits of using randomized neural networks instead of standard basis functions or deep neural networks to approximate the solutions of optimal stopping problems. The key idea is to use neural networks, where the parameters of the hidden layers are generated randomly and only the last layer is trained, in order to approximate the continuation value. Our approaches are applicable to high dimensional problems where the existing approaches become increasingly impractical. In addition, since our approaches can be optimized using simple linear regression, they are easy to implement and theoretical guarantees can be provided. We test our approaches for American option pricing on Black--Scholes, Heston and rough Heston models and for optimally stopping a fractional Brownian motion. In all cases, our algorithms outperform the state-of-the-art and other relevant machine learning approaches in terms of computation time while achieving comparable results. Moreover, we show that they can also be used to efficiently compute Greeks of American options.
LGFeb 26, 2021
NOMU: Neural Optimization-based Model UncertaintyJakob Heiss, Jakob Weissteiner, Hanna Wutte et al.
We study methods for estimating model uncertainty for neural networks (NNs) in regression. To isolate the effect of model uncertainty, we focus on a noiseless setting with scarce training data. We introduce five important desiderata regarding model uncertainty that any method should satisfy. However, we find that established benchmarks often fail to reliably capture some of these desiderata, even those that are required by Bayesian theory. To address this, we introduce a new approach for capturing model uncertainty for NNs, which we call Neural Optimization-based Model Uncertainty (NOMU). The main idea of NOMU is to design a network architecture consisting of two connected sub-NNs, one for model prediction and one for model uncertainty, and to train it using a carefully-designed loss function. Importantly, our design enforces that NOMU satisfies our five desiderata. Due to its modular architecture, NOMU can provide model uncertainty for any given (previously trained) NN if given access to its training data. We evaluate NOMU in various regressions tasks and noiseless Bayesian optimization (BO) with costly evaluations. In regression, NOMU performs at least as well as state-of-the-art methods. In BO, NOMU even outperforms all considered benchmarks.
NESep 17, 2020
Discrete-time signatures and randomness in reservoir computingChrista Cuchiero, Lukas Gonon, Lyudmila Grigoryeva et al.
A new explanation of geometric nature of the reservoir computing phenomenon is presented. Reservoir computing is understood in the literature as the possibility of approximating input/output systems with randomly chosen recurrent neural systems and a trained linear readout layer. Light is shed on this phenomenon by constructing what is called strongly universal reservoir systems as random projections of a family of state-space systems that generate Volterra series expansions. This procedure yields a state-affine reservoir system with randomly generated coefficients in a dimension that is logarithmically reduced with respect to the original system. This reservoir system is able to approximate any element in the fading memory filters class just by training a different linear readout for each different filter. Explicit expressions for the probability distributions needed in the generation of the projected reservoir system are stated and bounds for the committed approximation error are provided.
CPJun 16, 2020
Consistent Recalibration Models and Deep CalibrationMatteo Gambara, Josef Teichmann
Consistent Recalibration models (CRC) have been introduced to capture in necessary generality the dynamic features of term structures of derivatives' prices. Several approaches have been suggested to tackle this problem, but all of them, including CRC models, suffered from numerical intractabilities mainly due to the presence of complicated drift terms or consistency conditions. We overcome this problem by machine learning techniques, which allow to store the crucial drift term's information in neural network type functions. This yields first time dynamic term structure models which can be efficiently simulated.
MLJun 8, 2020
Neural Jump Ordinary Differential Equations: Consistent Continuous-Time Prediction and FilteringCalypso Herrera, Florian Krach, Josef Teichmann
Combinations of neural ODEs with recurrent neural networks (RNN), like GRU-ODE-Bayes or ODE-RNN are well suited to model irregularly observed time series. While those models outperform existing discrete-time approaches, no theoretical guarantees for their predictive capabilities are available. Assuming that the irregularly-sampled time series data originates from a continuous stochastic process, the $L^2$-optimal online prediction is the conditional expectation given the currently available information. We introduce the Neural Jump ODE (NJ-ODE) that provides a data-driven approach to learn, continuously in time, the conditional expectation of a stochastic process. Our approach models the conditional expectation between two observations with a neural ODE and jumps whenever a new observation is made. We define a novel training framework, which allows us to prove theoretical guarantees for the first time. In particular, we show that the output of our model converges to the $L^2$-optimal prediction. This can be interpreted as solution to a special filtering problem. We provide experiments showing that the theoretical results also hold empirically. Moreover, we experimentally show that our model outperforms the baselines in more complex learning tasks and give comparisons on real-world datasets.
CPMay 5, 2020
A generative adversarial network approach to calibration of local stochastic volatility modelsChrista Cuchiero, Wahid Khosrawi, Josef Teichmann
We propose a fully data-driven approach to calibrate local stochastic volatility (LSV) models, circumventing in particular the ad hoc interpolation of the volatility surface. To achieve this, we parametrize the leverage function by a family of feed-forward neural networks and learn their parameters directly from the available market option prices. This should be seen in the context of neural SDEs and (causal) generative adversarial networks: we generate volatility surfaces by specific neural SDEs, whose quality is assessed by quantifying, possibly in an adversarial manner, distances to market prices. The minimization of the calibration functional relies strongly on a variance reduction technique based on hedging and deep hedging, which is interesting in its own right: it allows the calculation of model prices and model implied volatilities in an accurate way using only small sets of sample paths. For numerical illustration we implement a SABR-type LSV model and conduct a thorough statistical performance analysis on many samples of implied volatility smiles, showing the accuracy and stability of the method.
MLApr 28, 2020
Denise: Deep Robust Principal Component Analysis for Positive Semidefinite MatricesCalypso Herrera, Florian Krach, Anastasis Kratsios et al.
The robust PCA of covariance matrices plays an essential role when isolating key explanatory features. The currently available methods for performing such a low-rank plus sparse decomposition are matrix specific, meaning, those algorithms must re-run for every new matrix. Since these algorithms are computationally expensive, it is preferable to learn and store a function that nearly instantaneously performs this decomposition when evaluated. Therefore, we introduce Denise, a deep learning-based algorithm for robust PCA of covariance matrices, or more generally, of symmetric positive semidefinite matrices, which learns precisely such a function. Theoretical guarantees for Denise are provided. These include a novel universal approximation theorem adapted to our geometric deep learning problem and convergence to an optimal solution to the learning problem. Our experiments show that Denise matches state-of-the-art performance in terms of decomposition quality, while being approximately $2000\times$ faster than the state-of-the-art, principal component pursuit (PCP), and $200 \times$ faster than the current speed-optimized method, fast PCP.
MLApr 27, 2020
Local Lipschitz Bounds of Deep Neural NetworksCalypso Herrera, Florian Krach, Josef Teichmann
The Lipschitz constant is an important quantity that arises in analysing the convergence of gradient-based optimization methods. It is generally unclear how to estimate the Lipschitz constant of a complex model. Thus, this paper studies an important problem that may be useful to the broader area of non-convex optimization. The main result provides a local upper bound on the Lipschitz constants of a multi-layer feed-forward neural network and its gradient. Moreover, lower bounds are established as well, which are used to show that it is impossible to derive global upper bounds for the Lipschitz constants. In contrast to previous works, we compute the Lipschitz constants with respect to the network parameters and not with respect to the inputs. These constants are needed for the theoretical description of many step size schedulers of gradient based optimization schemes and their convergence analysis. The idea is both simple and effective. The results are extended to a generalization of neural networks, continuously deep neural networks, which are described by controlled ODEs.
LGNov 7, 2019
How Implicit Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part I: the 1-D Case of Two Layers with Random First LayerJakob Heiss, Josef Teichmann, Hanna Wutte
In this paper, we consider one dimensional (shallow) ReLU neural networks in which weights are chosen randomly and only the terminal layer is trained. First, we mathematically show that for such networks L2-regularized regression corresponds in function space to regularizing the estimate's second derivative for fairly general loss functionals. For least squares regression, we show that the trained network converges to the smooth spline interpolation of the training data as the number of hidden nodes tends to infinity. Moreover, we derive a novel correspondence between the early stopped gradient descent (without any explicit regularization of the weights) and the smoothing spline regression.
PRNov 23, 2009
A new extrapolation method for weak approximation schemes with applicationsKojiro Oshima, Josef Teichmann, Dejan Veluscek
We review Fujiwara's scheme, a sixth order weak approximation scheme for the numerical approximation of SDEs, and embed it into a general method to construct weak approximation schemes of order $ 2m $ for $ m \in \mathbf{N} $. Those schemes cannot be seen as cubature schemes, but rather as universal ways how to extrapolate from a lower order weak approximation scheme, namely the Ninomiya-Victoir scheme, for higher orders.
PROct 1, 2008
Absolutely continuous laws of Jump-Diffusions in finite and infinite dimensions with applications to mathematical FinanceBarbara Forster, Eva Luetkebohmert, Josef Teichmann
In mathematical Finance calculating the Greeks by Malliavin weights has proved to be a numerically satisfactory procedure for finite-dimensional Itô-diffusions. The existence of Malliavin weights relies on absolute continuity of laws of the projected diffusion process and a sufficiently regular density. In this article we first prove results on absolute continuity for laws of projected jump-diffusion processes in finite and infinite dimensions, and a general result on the existence of Malliavin weights in finite dimension. In both cases we assume Hörmander conditions and hypotheses on the invertibility of the so-called linkage operators. The purpose of this article is to show that for the construction of numerical procedures for the calculation of the Greeks in fairly general jump-diffusion cases one can proceed as in a pure diffusion case. We also show how the given results apply to infinite dimensional questions in mathematical Finance. There we start from the Vasiček model, and add -- by pertaining no arbitrage -- a jump diffusion component. We prove that we can obtain in this case an interest rate model, where the law of any projection is absolutely continuous with respect to Lebesgue measure on $\mathbb{R}^M $.
PRMay 4, 2005
Calculating the Greeks by Cubature formulasJosef Teichmann
We provide cubature formulas for the calculation of derivatives of expected values in the spririt of Terry Lyons and Nicolas Victoir. In financial mathematics derivatives of option prices with respect to initial values, so called Greeks, are of particular importance as hedging parameters. Cubature formulas allow to calculate these quantities very quickly. Simple examples are added to the theoretical exposition.
NAMay 4, 2005
The proof of Tchakaloff's TheoremChristian Bayer, Josef Teichmann
We provide a simple proof of Tchakaloff's Theorem on the existence of cubature formulas of degree $m$ for Borel measures with moments up to order $m$. The result improves known results for non-compact supports, since we do not need conditions on $(m+1)$st moments.