LGMay 18, 2025

Resolving Latency and Inventory Risk in Market Making with Reinforcement Learning

Junzhe Jiang, Chang Yang, Xinrun Wang, Zhiming Li, Xiao Huang, Bo Li

arXiv:2505.12465v14.11 citationsh-index: 5

Originality Incremental advance

AI Analysis

This addresses a practical limitation for market makers where existing RL methods fail in real scenarios due to latency issues, though it appears incremental in adapting RL to specific market constraints.

The paper tackles the problem of latency and inventory risk in market making by proposing Relaver, a reinforcement learning method that incorporates order hold time and uses dynamic programming guidance. Experiments on four real-world datasets show it significantly improves performance over state-of-the-art RL-based market making strategies.

The latency of the exchanges in Market Making (MM) is inevitable due to hardware limitations, system processing times, delays in receiving data from exchanges, the time required for order transmission to reach the market, etc. Existing reinforcement learning (RL) methods for Market Making (MM) overlook the impact of these latency, which can lead to unintended order cancellations due to price discrepancies between decision and execution times and result in undesired inventory accumulation, exposing MM traders to increased market risk. Therefore, these methods cannot be applied in real MM scenarios. To address these issues, we first build a realistic MM environment with random delays of 30-100 milliseconds for order placement and market information reception, and implement a batch matching mechanism that collects orders within every 500 milliseconds before matching them all at once, simulating the batch auction mechanisms adopted by some exchanges. Then, we propose Relaver, an RL-based method for MM to tackle the latency and inventory risk issues. The three main contributions of Relaver are: i) we introduce an augmented state-action space that incorporates order hold time alongside price and volume, enabling Relaver to optimize execution strategies under latency constraints and time-priority matching mechanisms, ii) we leverage dynamic programming (DP) to guide the exploration of RL training for better policies, iii) we train a market trend predictor, which can guide the agent to intelligently adjust the inventory to reduce the risk. Extensive experiments and ablation studies on four real-world datasets demonstrate that \textsc{Relaver} significantly improves the performance of state-of-the-art RL-based MM strategies across multiple metrics.

View on arXiv PDF

Similar