Nabeel Ahmad Saidd

10.7GNMar 20

Decomposable Reward Modeling and Realistic Environment Design for Reinforcement Learning-Based Forex Trading

Nabeel Ahmad Saidd

Applying reinforcement learning (RL) to foreign exchange (Forex) trading remains challenging because realistic environments, well-defined reward functions, and expressive action spaces must be satisfied simultaneously, yet many prior studies rely on simplified simulators, single scalar rewards, and restricted action representations, limiting both interpretability and practical relevance. This paper presents a modular RL framework designed to address these limitations through three tightly integrated components: a friction-aware execution engine that enforces strict anti-lookahead semantics, with observations at time t, execution at time t+1, and mark-to-market at time t+1, while incorporating realistic costs such as spread, commission, slippage, rollover financing, and margin-triggered liquidation; a decomposable 11-component reward architecture with fixed weights and per-step diagnostic logging to enable systematic ablation and component-level attribution; and a 10-action discrete interface with legal-action masking that encodes explicit trading primitives while enforcing margin-aware feasibility constraints. Empirical evaluation on EURUSD focuses on learning dynamics rather than generalization and reveals strongly non-monotonic reward interactions, where additional penalties do not reliably improve outcomes; the full reward configuration achieves the highest training Sharpe (0.765) and cumulative return (57.09 percent). The expanded action space increases return but also turnover and reduces Sharpe relative to a conservative 3-action baseline, indicating a return-activity trade-off under a fixed training budget, while scaling-enabled variants consistently reduce drawdown, with the combined configuration achieving the strongest endpoint performance.

STFeb 27

A Controlled Comparison of Deep Learning Architectures for Multi-Horizon Financial Forecasting: Evidence from 918 Experiments

Nabeel Ahmad Saidd

Multi-horizon price forecasting is central to portfolio allocation, risk management, and algorithmic trading, yet deep learning architectures have proliferated faster than rigorous financial benchmarks can evaluate them. This study provides a controlled comparison of nine architectures (Autoformer, DLinear, iTransformer, LSTM, ModernTCN, N-HiTS, PatchTST, TimesNet, and TimeXer) spanning Transformer, MLP, CNN, and RNN families across cryptocurrency, forex, and equity index markets at 4-hour and 24-hour horizons. A total of 918 experiments were conducted under a strict five-stage protocol including fixed-seed Bayesian hyperparameter optimization, configuration freezing per asset class, multi-seed retraining, uncertainty aggregation, and statistical validation. ModernTCN achieves the best mean rank (1.333) with a 75 percent first-place rate, followed by PatchTST (2.000). Results reveal a clear three-tier ranking structure and show that architecture explains nearly all performance variance, while seed randomness is negligible. Rankings remain stable across horizons despite 2 to 2.5 times error amplification. Directional accuracy remains near 50 percent across all configurations, indicating that MSE-trained models lack directional skill at hourly resolution. The findings highlight the importance of architectural inductive bias over raw parameter count and provide reproducible guidance for multi-step financial forecasting.

Nabeel Ahmad Saidd

2 Papers