LG AI MAOct 1, 2023

From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information

Zhendong Shi, Xiaoli Wei, Ercan E. Kuruoglu

arXiv:2310.00642v12.0h-index: 29

Originality Synthesis-oriented

AI Analysis

This work addresses inefficiencies in reinforcement learning for financial decision-making, though it appears incremental by combining existing methods.

The study tackled the problem of slow convergence and high resource consumption in reinforcement learning for quantitative trading by integrating contextual information and a known financial strategy into the learning process, resulting in accelerated iteration speeds towards optimal solutions.

The problem of how to take the right actions to make profits in sequential process continues to be difficult due to the quick dynamics and a significant amount of uncertainty in many application scenarios. In such complicated environments, reinforcement learning (RL), a reward-oriented strategy for optimum control, has emerged as a potential technique to address this strategic decision-making issue. However, reinforcement learning also has some shortcomings that make it unsuitable for solving many financial problems, excessive resource consumption, and inability to quickly obtain optimal solutions, making it unsuitable for quantitative trading markets. In this study, we use two methods to overcome the issue with contextual information: contextual Thompson sampling and reinforcement learning under supervision which can accelerate the iterations in search of the best answer. In order to investigate strategic trading in quantitative markets, we merged the earlier financial trading strategy known as constant proportion portfolio insurance (CPPI) into deep deterministic policy gradient (DDPG). The experimental results show that both methods can accelerate the progress of reinforcement learning to obtain the optimal solution.

View on arXiv PDF

Similar