Asynchronous Deep Double Duelling Q-Learning for Trading-Signal Execution in Limit Order Book Markets
This work addresses adaptive trading strategies for limit order book markets, but it is incremental as it builds on existing RL methods without introducing major innovations.
The authors tackled the problem of translating high-frequency trading signals into effective limit order placements using deep reinforcement learning, and found that their RL agent outperformed a heuristic benchmark strategy when tested with synthetic alpha signals.
We employ deep reinforcement learning (RL) to train an agent to successfully translate a high-frequency trading signal into a trading strategy that places individual limit orders. Based on the ABIDES limit order book simulator, we build a reinforcement learning OpenAI gym environment and utilise it to simulate a realistic trading environment for NASDAQ equities based on historic order book messages. To train a trading agent that learns to maximise its trading return in this environment, we use Deep Duelling Double Q-learning with the APEX (asynchronous prioritised experience replay) architecture. The agent observes the current limit order book state, its recent history, and a short-term directional forecast. To investigate the performance of RL for adaptive trading independently from a concrete forecasting algorithm, we study the performance of our approach utilising synthetic alpha signals obtained by perturbing forward-looking returns with varying levels of noise. Here, we find that the RL agent learns an effective trading strategy for inventory management and order placing that outperforms a heuristic benchmark trading strategy having access to the same signal.