A. Max Reppen

2papers

2 Papers

MLNov 18, 2020
Deep Empirical Risk Minimization in finance: looking into the future

A. Max Reppen, H. Mete Soner

Many modern computational approaches to classical problems in quantitative finance are formulated as empirical loss minimization (ERM), allowing direct applications of classical results from statistical machine learning. These methods, designed to directly construct the optimal feedback representation of hedging or investment decisions, are analyzed in this framework demonstrating their effectiveness as well as their susceptibility to generalization error. Use of classical techniques shows that over-training renders trained investment decisions to become anticipative, and proves overlearning for large hypothesis spaces. On the other hand, non-asymptotic estimates based on Rademacher complexity show the convergence for sufficiently large training sets. These results emphasize the importance of synthetic data generation and the appropriate calibration of complex models to market data. A numerically studied stylized example illustrates these possibilities, including the importance of problem dimension in the degree of overlearning, and the effectiveness of this approach.

LGJul 15, 2020
Deep PQR: Solving Inverse Reinforcement Learning using Anchor Actions

Sinong Geng, Houssam Nassif, Carlos A. Manzanares et al.

We propose a reward function estimation framework for inverse reinforcement learning with deep energy-based policies. We name our method PQR, as it sequentially estimates the Policy, the $Q$-function, and the Reward function by deep learning. PQR does not assume that the reward solely depends on the state, instead it allows for a dependency on the choice of action. Moreover, PQR allows for stochastic state transitions. To accomplish this, we assume the existence of one anchor action whose reward is known, typically the action of doing nothing, yielding no reward. We present both estimators and algorithms for the PQR method. When the environment transition is known, we prove that the PQR reward estimator uniquely recovers the true reward. With unknown transitions, we bound the estimation error of PQR. Finally, the performance of PQR is demonstrated by synthetic and real-world datasets.