AI LG MAJul 6, 2023

Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance

Yuchen Fang, Zhenggang Tang, Kan Ren, Weiqing Liu, Li Zhao, Jiang Bian, Dongsheng Li, Weinan Zhang, Yong Yu, Tie-Yan Liu

arXiv:2307.03119v113.216 citationsh-index: 91

Originality Incremental advance

AI Analysis

This addresses a practical limitation in quantitative finance for traders and institutions, but it is incremental as it builds on existing multi-agent RL methods with a novel communication enhancement.

The paper tackles the problem of executing multiple trading orders simultaneously in finance, which existing methods overlook, and proposes a multi-agent reinforcement learning approach with a learnable communication protocol that achieves significantly better collaboration effectiveness in experiments on real-world market data.

Order execution is a fundamental task in quantitative finance, aiming at finishing acquisition or liquidation for a number of trading orders of the specific assets. Recent advance in model-free reinforcement learning (RL) provides a data-driven solution to the order execution problem. However, the existing works always optimize execution for an individual order, overlooking the practice that multiple orders are specified to execute simultaneously, resulting in suboptimality and bias. In this paper, we first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints. Specifically, we treat every agent as an individual operator to trade one specific order, while keeping communicating with each other and collaborating for maximizing the overall profits. Nevertheless, the existing MARL algorithms often incorporate communication among agents by exchanging only the information of their partial observations, which is inefficient in complicated financial market. To improve collaboration, we then propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other and refining accordingly. It is optimized through a novel action value attribution method which is provably consistent with the original learning objective yet more efficient. The experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness achieved by our method.

View on arXiv PDF

Similar