STLGAug 16, 2023

Microstructure-Empowered Stock Factor Extraction and Utilization

arXiv:2308.08135v12 citationsh-index: 49
Originality Incremental advance
AI Analysis

This addresses a practical problem for quantitative traders by enabling better utilization of detailed order flow data, though it appears incremental as it builds on existing tick-level approaches with broader data handling.

The paper tackles the challenge of extracting useful factors from high-frequency order flow data for stock investment, proposing a novel framework with Context Encoder and Factor Extractor components that efficiently processes a year of data and shows significant improvement for stock trend prediction and order execution tasks at second and minute levels.

High-frequency quantitative investment is a crucial aspect of stock investment. Notably, order flow data plays a critical role as it provides the most detailed level of information among high-frequency trading data, including comprehensive data from the order book and transaction records at the tick level. The order flow data is extremely valuable for market analysis as it equips traders with essential insights for making informed decisions. However, extracting and effectively utilizing order flow data present challenges due to the large volume of data involved and the limitations of traditional factor mining techniques, which are primarily designed for coarser-level stock data. To address these challenges, we propose a novel framework that aims to effectively extract essential factors from order flow data for diverse downstream tasks across different granularities and scenarios. Our method consists of a Context Encoder and an Factor Extractor. The Context Encoder learns an embedding for the current order flow data segment's context by considering both the expected and actual market state. In addition, the Factor Extractor uses unsupervised learning methods to select such important signals that are most distinct from the majority within the given context. The extracted factors are then utilized for downstream tasks. In empirical studies, our proposed framework efficiently handles an entire year of stock order flow data across diverse scenarios, offering a broader range of applications compared to existing tick-level approaches that are limited to only a few days of stock data. We demonstrate that our method extracts superior factors from order flow data, enabling significant improvement for stock trend prediction and order execution tasks at the second and minute level.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes