CEMar 15

From Text to Alpha: Can LLMs Track Evolving Signals in Corporate Disclosures?

Chanyeol Choi, Yoon Kim, Yu Yu, Young Cha, V. Zach Golkhou, Igor Halperin, Georgios Papaioannou, Minkyu Kim, Zhangyang Wang, Jihoon Kwon, Minjae Kim, Alejandro Lopez-Lira

arXiv:2510.0319597.1h-index: 24

AI Analysis

This addresses the challenge for quantitative finance practitioners of under-exploiting narrative signals in corporate data, though it is an incremental improvement over existing NLP methods.

The paper tackles the problem of extracting predictive signals from corporate disclosures by using LLMs to capture nuanced semantic changes, achieving more than twice the risk-adjusted alpha compared to a baseline method.

Natural language processing (NLP) has been widely used in quantitative finance, but traditional methods often struggle to capture rich narratives in corporate disclosures, leaving potentially informative signals under-explored. Large language models (LLMs) offer a promising alternative due to their ability to extract nuanced semantics. In this paper, we ask whether semantic signals extracted by LLMs from corporate disclosures predict alpha, defined as abnormal returns beyond broad market movements and common risk factors. We introduce a simple framework, LLM as extractor, embedding as ruler, which extracts context-aware, metric-focused textual spans and quantifies semantic changes across consecutive disclosure periods using embedding-based similarity. This allows us to measure the degree of metric shifting -- how much firms move away from previously emphasized metrics, referred as moving targets. In experiments with portfolio and cross-sectional regression tests against a recent NER-based baseline, our method achieves more than twice the risk-adjusted alpha and shows significantly stronger predictive power. Qualitative analysis suggests that these gains stem from preserving contextual qualifiers and filtering out non-metric terms that keyword-based approaches often miss.

View on arXiv PDF

Similar