Representation Learning of Limit Order Book: A Comprehensive Study and Benchmarking
It addresses the problem of limited reusability and generalization in financial market analysis for researchers and practitioners, though it is incremental as it builds on existing representation learning methods.
This paper tackles the challenge of learning transferable representations from Limit Order Book (LOB) data by introducing LOBench, a standardized benchmark with real market data, and demonstrates that LOB representations outperform traditional task-specific and general time-series models in various downstream tasks.
The Limit Order Book (LOB), the mostly fundamental data of the financial market, provides a fine-grained view of market dynamics while poses significant challenges in dealing with the esteemed deep models due to its strong autocorrelation, cross-feature constrains, and feature scale disparity. Existing approaches often tightly couple representation learning with specific downstream tasks in an end-to-end manner, failed to analyze the learned representations individually and explicitly, limiting their reusability and generalization. This paper conducts the first systematic comparative study of LOB representation learning, aiming to identify the effective way of extracting transferable, compact features that capture essential LOB properties. We introduce LOBench, a standardized benchmark with real China A-share market data, offering curated datasets, unified preprocessing, consistent evaluation metrics, and strong baselines. Extensive experiments validate the sufficiency and necessity of LOB representations for various downstream tasks and highlight their advantages over both the traditional task-specific end-to-end models and the advanced representation learning models for general time series. Our work establishes a reproducible framework and provides clear guidelines for future research. Datasets and code will be publicly available at https://github.com/financial-simulation-lab/LOBench.