SYJun 11, 2020
Convex Analysis for LQG Systems with Applications to Major Minor LQG Mean-Field Game SystemsDena Firoozi, Sebastian Jaimungal, Peter E. Caines
We develop a convex analysis approach for solving LQG optimal control problems and apply it to major-minor (MM) LQG mean-field game (MFG) systems. The approach retrieves the best response strategies for the major agent and all minor agents that attain an $ε$-Nash equilibrium. An important and distinctive advantage to this approach is that unlike the classical approach in the literature, we are able to avoid imposing assumptions on the evolution of the mean-field. In particular, this provides a tool for dealing with complex and non-standard systems.
CPMar 27, 2023
Robust Risk-Aware Option HedgingDavid Wu, Sebastian Jaimungal
The objectives of option hedging/trading extend beyond mere protection against downside risks, with a desire to seek gains also driving agent's strategies. In this study, we showcase the potential of robust risk-aware reinforcement learning (RL) in mitigating the risks associated with path-dependent financial derivatives. We accomplish this by leveraging a policy gradient approach that optimises robust risk-aware performance criteria. We specifically apply this methodology to the hedging of barrier options, and highlight how the optimal hedging strategy undergoes distortions as the agent moves from being risk-averse to risk-seeking. As well as how the agent robustifies their strategy. We further investigate the performance of the hedge when the data generating process (DGP) varies from the training DGP, and demonstrate that the robust strategies outperform the non-robust ones.
LGJun 29, 2022
Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement LearningAnthony Coache, Sebastian Jaimungal, Álvaro Cartea
We propose a novel framework to solve risk-sensitive reinforcement learning (RL) problems where the agent optimises time-consistent dynamic spectral risk measures. Based on the notion of conditional elicitability, our methodology constructs (strictly consistent) scoring functions that are used as penalizers in the estimation procedure. Our contribution is threefold: we (i) devise an efficient approach to estimate a class of dynamic spectral risk measures with deep neural networks, (ii) prove that these dynamic spectral risk measures may be approximated to any arbitrary accuracy using deep neural networks, and (iii) develop a risk-sensitive actor-critic algorithm that uses full episodes and does not require any additional nested transitions. We compare our conceptually improved reinforcement learning algorithm with the nested simulation approach and illustrate its performance in two settings: statistical arbitrage and portfolio allocation on both simulated and real data.
CPMar 1, 2023
FuNVol: A Multi-Asset Implied Volatility Market Simulator using Functional Principal Components and Neural SDEsVedant Choudhary, Sebastian Jaimungal, Maxime Bergeron
We introduce a new approach for generating sequences of implied volatility (IV) surfaces across multiple assets that is faithful to historical prices. We do so using a combination of functional data analysis and neural stochastic differential equations (SDEs) combined with a probability integral transform penalty to reduce model misspecification. We demonstrate that learning the joint dynamics of IV surfaces and prices produces market scenarios that are consistent with historical features and lie within the sub-manifold of surfaces that are essentially free of static arbitrage. Finally, we demonstrate that delta hedging using the simulated surfaces generates profit and loss (P&L) distributions that are consistent with realised P&Ls.
OCJul 23, 2019
Mean Field Game Systems with Common Noise and Markovian Latent ProcessesDena Firoozi, Peter E. Caines, Sebastian Jaimungal
In many stochastic games stemming from financial models, the environment evolves with latent factors and there may be common noise across agents' states. Two classic examples are: (i) multi-agent trading on electronic exchanges, and (ii) systemic risk induced through inter-bank lending/borrowing. Moreover, agents' actions often affect the environment, and some agent's may be small while others large. Hence sub-population of agents may act as minor agents, while another class may act as major agents. To capture the essence of such problems, here, we introduce a general class of non-cooperative heterogeneous stochastic games with one major agent and a large population of minor agents where agents interact with an observed common process impacted by the mean field. A latent Markov chain and a latent Wiener process (common noise) modulate the common process, and agents cannot observe them. We use filtering techniques coupled with a convex analysis approach to (i) solve the mean field game limit of the problem, (ii) demonstrate that the best response strategies generate an $ε$-Nash equilibrium for finite populations, and (iii) obtain explicit characterisations of the best response strategies.
LGSep 16, 2024
Robust Reinforcement Learning with Dynamic Distortion Risk MeasuresAnthony Coache, Sebastian Jaimungal
In a reinforcement learning (RL) setting, the agent's optimal strategy heavily depends on her risk preferences and the underlying model dynamics of the training environment. These two aspects influence the agent's ability to make well-informed and time-consistent decisions when facing testing environments. In this work, we devise a framework to solve robust risk-aware RL problems where we simultaneously account for environmental uncertainty and risk with a class of dynamic robust distortion risk measures. Robustness is introduced by considering all models within a Wasserstein ball around a reference model. We estimate such dynamic robust risk measures using neural networks by making use of strictly consistent scoring functions, derive policy gradient formulae using the quantile representation of distortion risk measures, and construct an actor-critic algorithm to solve this class of robust risk-aware RL problems. We demonstrate the performance of our algorithm on a portfolio allocation example.
MLAug 16, 2023
Eliciting Risk Aversion with Inverse Reinforcement Learning via Interactive QuestioningZiteng Cheng, Anthony Coache, Sebastian Jaimungal
We investigate a framework for robo-advisors to estimate non-expert clients' risk aversion using adaptive binary-choice questionnaires. We model risk aversion using cost functions and spectral risk measures in a static setting. We prove the finite-sample identifiability and, for properly designed questions, obtain a convergence rate of $\sqrt{N}$ up to a logarithmic factor, where $N$ is the number of questions. We introduce the notion of distinguishing power and demonstrate, through simulated experiments, that designing questions by maximizing distinguishing power achieves satisfactory accuracy in learning risk aversion with fewer than 50 questions. We also provide a preliminary investigation of an infinite-horizon setting with an additional discount factor for dynamic risk aversion, establishing qualitative identifiability in this case.
LGDec 16, 2025
Deep Learning and Elicitability for McKean-Vlasov FBSDEs With Common NoiseFelipe J. P. Antunes, Yuri F. Saporito, Sebastian Jaimungal
We present a novel numerical method for solving McKean-Vlasov forward-backward stochastic differential equations (MV-FBSDEs) with common noise, combining Picard iterations, elicitability and deep learning. The key innovation involves elicitability to derive a path-wise loss function, enabling efficient training of neural networks to approximate both the backward process and the conditional expectations arising from common noise - without requiring computationally expensive nested Monte Carlo simulations. The mean-field interaction term is parameterized via a recurrent neural network trained to minimize an elicitable score, while the backward process is approximated through a feedforward network representing the decoupling field. We validate the algorithm on a systemic risk inter-bank borrowing and lending model, where analytical solutions exist, demonstrating accurate recovery of the true solution. We further extend the model to quantile-mediated interactions, showcasing the flexibility of the elicitability framework beyond conditional means or moments. Finally, we apply the method to a non-stationary Aiyagari--Bewley--Huggett economic growth model with endogenous interest rates, illustrating its applicability to complex mean-field games without closed-form solutions.
LGFeb 27, 2023
Distributional Method for Risk Averse Reinforcement LearningZiteng Cheng, Sebastian Jaimungal, Nick Martin
We introduce a distributional method for learning the optimal policy in risk averse Markov decision process with finite state action spaces, latent costs, and stationary dynamics. We assume sequential observations of states, actions, and costs and assess the performance of a policy using dynamic risk measures constructed from nested Kusuoka-type conditional risk mappings. For such performance criteria, randomized policies may outperform deterministic policies, therefore, the candidate policies lie in the d-dimensional simplex where d is the cardinality of the action space. Existing risk averse reinforcement learning methods seldom concern randomized policies, naïve extensions to current setting suffer from the curse of dimensionality. By exploiting certain structures embedded in the corresponding dynamic programming principle, we propose a distributional learning method for seeking the optimal policy. The conditional distribution of the value function is casted into a specific type of function, which is chosen with in mind the ease of risk averse optimization. We use a deep neural network to approximate said function, illustrate that the proposed method avoids the curse of dimensionality in the exploration phase, and explore the method's performance with a wide range of model parameters that are picked randomly.
MLJun 13, 2024Code
Learning conditional distributions on continuous spacesCyril Bénézet, Ziteng Cheng, Sebastian Jaimungal
We investigate sample-based learning of conditional distributions on multi-dimensional unit boxes, allowing for different dimensions of the feature and target spaces. Our approach involves clustering data near varying query points in the feature space to create empirical measures in the target space. We employ two distinct clustering schemes: one based on a fixed-radius ball and the other on nearest neighbors. We establish upper bounds for the convergence rates of both methods and, from these bounds, deduce optimal configurations for the radius and the number of neighbors. We propose to incorporate the nearest neighbors method into neural network training, as our empirical analysis indicates it has better performance in practice. For efficiency, our training process utilizes approximate nearest neighbors search with random binary space partitioning. Additionally, we employ the Sinkhorn algorithm and a sparsity-enforced transport plan. Our empirical findings demonstrate that, with a suitably designed structure, the neural network has the ability to adapt to a suitable level of Lipschitz continuity locally. For reproducibility, our code is available at \url{https://github.com/zcheng-a/LCD_kNN}.
MFApr 15, 2025
Multi-Agent Reinforcement Learning for Greenhouse Gas Offset Credit MarketsLiam Welsh, Udit Grover, Sebastian Jaimungal
Climate change is a major threat to the future of humanity, and its impacts are being intensified by excess man-made greenhouse gas emissions. One method governments can employ to control these emissions is to provide firms with emission limits and penalize any excess emissions above the limit. Excess emissions may also be offset by firms who choose to invest in carbon reducing and capturing projects. These projects generate offset credits which can be submitted to a regulating agency to offset a firm's excess emissions, or they can be traded with other firms. In this work, we characterize the finite-agent Nash equilibrium for offset credit markets. As computing Nash equilibria is an NP-hard problem, we utilize the modern reinforcement learning technique Nash-DQN to efficiently estimate the market's Nash equilibria. We demonstrate not only the validity of employing reinforcement learning methods applied to climate themed financial markets, but also the significant financial savings emitting firms may achieve when abiding by the Nash equilibria through numerical experiments.
LGDec 26, 2021
Reinforcement Learning with Dynamic Convex Risk MeasuresAnthony Coache, Sebastian Jaimungal
We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor-critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.
LGOct 3, 2021
Deep Learning for Principal-Agent Mean Field GamesSteven Campbell, Yichao Chen, Arvind Shrivats et al.
Here, we develop a deep learning algorithm for solving Principal-Agent (PA) mean field games with market-clearing conditions -- a class of problems that have thus far not been studied and one that poses difficulties for standard numerical methods. We use an actor-critic approach to optimization, where the agents form a Nash equilibria according to the principal's penalty function, and the principal evaluates the resulting equilibria. The inner problem's Nash equilibria is obtained using a variant of the deep backward stochastic differential equation (BSDE) method modified for McKean-Vlasov forward-backward SDEs that includes dependence on the distribution over both the forward and backward processes. The outer problem's loss is further approximated by a neural net by sampling over the space of penalty functions. We apply our approach to a stylized PA problem arising in Renewable Energy Certificate (REC) markets, where agents may rent clean energy production capacity, trade RECs, and expand their long-term capacity to navigate the market at maximum profit. Our numerical results illustrate the efficacy of the algorithm and lead to interesting insights into the nature of optimal PA interactions in the mean-field limit of these markets.
LGAug 23, 2021
Robust Risk-Aware Reinforcement LearningSebastian Jaimungal, Silvana Pesenti, Ye Sheng Wang et al.
We present a reinforcement learning (RL) approach for robust optimisation of risk-aware performance criteria. To allow agents to express a wide variety of risk-reward profiles, we assess the value of a policy using rank dependent expected utility (RDEU). RDEU allows the agent to seek gains, while simultaneously protecting themselves against downside risk. To robustify optimal policies against model uncertainty, we assess a policy not by its distribution, but rather, by the worst possible distribution that lies within a Wasserstein ball around it. Thus, our problem formulation may be viewed as an actor/agent choosing a policy (the outer problem), and the adversary then acting to worsen the performance of that strategy (the inner problem). We develop explicit policy gradient formulae for the inner and outer problems, and show its efficacy on three prototypical financial problems: robust portfolio allocation, optimising a benchmark, and statistical arbitrage.
MFAug 10, 2021
Arbitrage-Free Implied Volatility Surface Generation with Variational AutoencodersBrian Ning, Sebastian Jaimungal, Xiaorong Zhang et al.
We propose a hybrid method for generating arbitrage-free implied volatility (IV) surfaces consistent with historical data by combining model-free Variational Autoencoders (VAEs) with continuous time stochastic differential equation (SDE) driven models. We focus on two classes of SDE models: regime switching models and Lévy additive processes. By projecting historical surfaces onto the space of SDE model parameters, we obtain a distribution on the parameter subspace faithful to the data on which we then train a VAE. Arbitrage-free IV surfaces are then generated by sampling from the posterior distribution on the latent space, decoding to obtain SDE model parameters, and finally mapping those parameters to IV surfaces. We further refine the VAE model by including conditional features and demonstrate its superior generative out-of-sample performance.
OCNov 25, 2020
Exploratory LQG Mean Field Games with Entropy RegularizationDena Firoozi, Sebastian Jaimungal
We study a general class of entropy-regularized multi-variate LQG mean field games (MFGs) in continuous time with $K$ distinct sub-population of agents. We extend the notion of actions to action distributions (exploratory actions), and explicitly derive the optimal action distributions for individual agents in the limiting MFG. We demonstrate that the optimal set of action distributions yields an $ε$-Nash equilibrium for the finite-population entropy-regularized MFG. Furthermore, we compare the resulting solutions with those of classical LQG MFGs and establish the equivalence of their existence.
LGApr 23, 2019
Deep Q-Learning for Nash Equilibria: Nash-DQNPhilippe Casgrain, Brian Ning, Sebastian Jaimungal
Model-free learning for multi-agent stochastic games is an active area of research. Existing reinforcement learning algorithms, however, are often restricted to zero-sum games, and are applicable only in small state-action spaces or other simplified settings. Here, we develop a new data efficient Deep-Q-learning methodology for model-free learning of Nash equilibria for general-sum stochastic games. The algorithm uses a local linear-quadratic expansion of the stochastic game, which leads to analytically solvable optimal actions. The expansion is parametrized by deep neural networks to give it sufficient flexibility to learn the environment without the need to experience all state-action pairs. We study symmetry properties of the algorithm stemming from label-invariant stochastic games and as a proof of concept, apply our algorithm to learning optimal trading strategies in competitive electronic markets.
PMMar 16, 2019
Active and Passive Portfolio Management with Latent FactorsAli Al-Aradi, Sebastian Jaimungal
We address a portfolio selection problem that combines active (outperformance) and passive (tracking) objectives using techniques from convex analysis. We assume a general semimartingale market model where the assets' growth rate processes are driven by a latent factor. Using techniques from convex analysis we obtain a closed-form solution for the optimal portfolio and provide a theorem establishing its uniqueness. The motivation for incorporating latent factors is to achieve improved growth rate estimation, an otherwise notoriously difficult task. To this end, we focus on a model where growth rates are driven by an unobservable Markov chain. The solution in this case requires a filtering step to obtain posterior probabilities for the state of the Markov chain from asset price information, which are subsequently used to find the optimal allocation. We show the optimal strategy is the posterior average of the optimal strategies the investor would have held in each state assuming the Markov chain remains in that state. Finally, we implement a number of historical backtests to demonstrate the performance of the optimal portfolio.
TRDec 17, 2018
Double Deep Q-Learning for Optimal ExecutionBrian Ning, Franco Ho Ting Lin, Sebastian Jaimungal
Optimal trade execution is an important problem faced by essentially all traders. Much research into optimal execution uses stringent model assumptions and applies continuous time stochastic control to solve them. Here, we instead take a model free approach and develop a variation of Deep Q-Learning to estimate the optimal actions of a trader. The model is a fully connected Neural Network trained using Experience Replay and Double DQN with input features given by the current state of the limit order book, other trading signals, and available execution actions, while the output is the Q-value function estimating the future rewards under an arbitrary action. We apply our model to nine different stocks and find that it outperforms the standard benchmark approach on most stocks using the measures of (i) mean and median out-performance, (ii) probability of out-performance, and (iii) gain-loss ratios.
MFJun 12, 2018
Trading algorithms with learning in latent alpha modelsPhilippe Casgrain, Sebastian Jaimungal
Alpha signals for statistical arbitrage strategies are often driven by latent factors. This paper analyses how to optimally trade with latent factors that cause prices to jump and diffuse. Moreover, we account for the effect of the trader's actions on quoted prices and the prices they receive from trading. Under fairly general assumptions, we demonstrate how the trader can learn the posterior distribution over the latent states, and explicitly solve the latent optimal trading problem. We provide a verification theorem, and a methodology for calibrating the model by deriving a variation of the expectation-maximization algorithm. To illustrate the efficacy of the optimal strategy, we demonstrate its performance through simulations and compare it to strategies which ignore learning in the latent factors. We also provide calibration results for a particular model using Intel Corporation stock as an example.