Zhendong Shi

4papers

14citations

Novelty30%

AI Score19

Ranked #197,236 of 205,806 authors (top 96%)#6,643 in CR (top 91%)

4 Papers

TRMar 15, 2023

Optimizing Trading Strategies in Quantitative Markets using Multi-Agent Reinforcement Learning

Hengxi Zhang, Zhendong Shi, Yuanquan Hu et al.

Quantitative markets are characterized by swift dynamics and abundant uncertainties, making the pursuit of profit-driven stock trading actions inherently challenging. Within this context, reinforcement learning (RL), which operates on a reward-centric mechanism for optimal control, has surfaced as a potentially effective solution to the intricate financial decision-making conundrums presented. This paper delves into the fusion of two established financial trading strategies, namely the constant proportion portfolio insurance (CPPI) and the time-invariant portfolio protection (TIPP), with the multi-agent deep deterministic policy gradient (MADDPG) framework. As a result, we introduce two novel multi-agent RL (MARL) methods, CPPI-MADDPG and TIPP-MADDPG, tailored for probing strategic trading within quantitative markets. To validate these innovations, we implemented them on a diverse selection of 100 real-market shares. Our empirical findings reveal that the CPPI-MADDPG and TIPP-MADDPG strategies consistently outpace their traditional counterparts, affirming their efficacy in the realm of quantitative trading.

LGOct 1, 2023

From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information

Zhendong Shi, Xiaoli Wei, Ercan E. Kuruoglu

The problem of how to take the right actions to make profits in sequential process continues to be difficult due to the quick dynamics and a significant amount of uncertainty in many application scenarios. In such complicated environments, reinforcement learning (RL), a reward-oriented strategy for optimum control, has emerged as a potential technique to address this strategic decision-making issue. However, reinforcement learning also has some shortcomings that make it unsuitable for solving many financial problems, excessive resource consumption, and inability to quickly obtain optimal solutions, making it unsuitable for quantitative trading markets. In this study, we use two methods to overcome the issue with contextual information: contextual Thompson sampling and reinforcement learning under supervision which can accelerate the iterations in search of the best answer. In order to investigate strategic trading in quantitative markets, we merged the earlier financial trading strategy known as constant proportion portfolio insurance (CPPI) into deep deterministic policy gradient (DDPG). The experimental results show that both methods can accelerate the progress of reinforcement learning to obtain the optimal solution.

MLMar 19, 2022

Thompson Sampling on Asymmetric $α$-Stable Bandits

Zhendong Shi, Ercan E. Kuruoglu, Xiaoli Wei

In algorithm optimization in reinforcement learning, how to deal with the exploration-exploitation dilemma is particularly important. Multi-armed bandit problem can optimize the proposed solutions by changing the reward distribution to realize the dynamic balance between exploration and exploitation. Thompson Sampling is a common method for solving multi-armed bandit problem and has been used to explore data that conform to various laws. In this paper, we consider the Thompson Sampling approach for multi-armed bandit problem, in which rewards conform to unknown asymmetric $α$-stable distributions and explore their applications in modelling financial and wireless data.

CRNov 28, 2021

Visualization and Attack Prevention for a Sensor-Based Agricultural Monitoring System

Yifan Zhou, Zhendong Shi, Ruoxi Sun

This project proposes a sensor-based visual agricultural monitoring system. Distinguished from traditional agricultural monitoring systems, this system further analyzes basic agricultural data and prevents and monitors common wireless network attacks such as Selective Forwarding, Black Hole Attacks, Sinkhole Attacks, Flooding Attacks and Misdirection Attacks. Experimental verification and evaluation of the attack prevention and monitoring are also conducted.