CPAug 16, 2024
Gradient Reduction Convolutional Neural Network Policy for Financial Deep Reinforcement LearningSina Montazeri, Haseebullah Jumakhan, Sonia Abrasiabian et al.
Building on our prior explorations of convolutional neural networks (CNNs) for financial data processing, this paper introduces two significant enhancements to refine our CNN model's predictive performance and robustness for financial tabular data. Firstly, we integrate a normalization layer at the input stage to ensure consistent feature scaling, addressing the issue of disparate feature magnitudes that can skew the learning process. This modification is hypothesized to aid in stabilizing the training dynamics and improving the model's generalization across diverse financial datasets. Secondly, we employ a Gradient Reduction Architecture, where earlier layers are wider and subsequent layers are progressively narrower. This enhancement is designed to enable the model to capture more complex and subtle patterns within the data, a crucial factor in accurately predicting financial outcomes. These advancements directly respond to the limitations identified in previous studies, where simpler models struggled with the complexity and variability inherent in financial applications. Initial tests confirm that these changes improve accuracy and model stability, suggesting that deeper and more nuanced network architectures can significantly benefit financial predictive tasks. This paper details the implementation of these enhancements and evaluates their impact on the model's performance in a controlled experimental setting.
STJan 10, 2024
CNN-DRL for Scalable Actions in FinanceSina Montazeri, Akram Mirzaeinia, Haseebullah Jumakhan et al.
The published MLP-based DRL in finance has difficulties in learning the dynamics of the environment when the action scale increases. If the buying and selling increase to one thousand shares, the MLP agent will not be able to effectively adapt to the environment. To address this, we designed a CNN agent that concatenates the data from the last ninety days of the daily feature vector to create the CNN input matrix. Our extensive experiments demonstrate that the MLP-based agent experiences a loss corresponding to the initial environment setup, while our designed CNN remains stable, effectively learns the environment, and leads to an increase in rewards.
LGFeb 18, 2025
Finding Optimal Trading History in Reinforcement Learning for Stock Market TradingSina Montazeri, Haseebullah Jumakhan, Amir Mirzaeinia
This paper investigates the optimization of temporal windows in Financial Deep Reinforcement Learning (DRL) models using 2D Convolutional Neural Networks (CNNs). We introduce a novel approach to treating the temporal field as a hyperparameter and examine its impact on model performance across various datasets and feature arrangements. We introduce a new hyperparameter for the CNN policy, proposing that this temporal field can and should be treated as a hyperparameter for these models. We examine the significance of this temporal field by iteratively expanding the window of observations presented to the CNN policy during the deep reinforcement learning process. Our iterative process involves progressively increasing the observation period from two weeks to twelve weeks, allowing us to examine the effects of different temporal windows on the model's performance. This window expansion is implemented in two settings. In one setting, we rearrange the features in the dataset to group them by company, allowing the model to have a full view of company data in its observation window and CNN kernel. In the second setting, we do not group the features by company, and features are arranged by category. Our study reveals that shorter temporal windows are most effective when no feature rearrangement to group per company is in effect. However, the model will utilize longer temporal windows and yield better performance once we introduce the feature rearrangement. To examine the consistency of our findings, we repeated our experiment on two datasets containing the same thirty companies from the Dow Jones Index but with different features in each dataset and consistently observed the above-mentioned patterns. The result is a trading model significantly outperforming global financial services firms such as the Global X Guru by the established Mirae Asset.
TRJun 29, 2024
Deep Reinforcement Learning Strategies in Finance: Insights into Asset Holding, Trading Behavior, and Purchase DiversityAlireza Mohammadshafie, Akram Mirzaeinia, Haseebullah Jumakhan et al.
Recent deep reinforcement learning (DRL) methods in finance show promising outcomes. However, there is limited research examining the behavior of these DRL algorithms. This paper aims to investigate their tendencies towards holding or trading financial assets as well as purchase diversity. By analyzing their trading behaviors, we provide insights into the decision-making processes of DRL models in finance applications. Our findings reveal that each DRL algorithm exhibits unique trading patterns and strategies, with A2C emerging as the top performer in terms of cumulative rewards. While PPO and SAC engage in significant trades with a limited number of stocks, DDPG and TD3 adopt a more balanced approach. Furthermore, SAC and PPO tend to hold positions for shorter durations, whereas DDPG, A2C, and TD3 display a propensity to remain stationary for extended periods.