Text analysis in financial disclosures
This work is an incremental review for researchers in financial analysis and computational linguistics, outlining the current state and future opportunities in text-based financial disclosure analysis.
This paper reviews existing literature on text analysis in financial disclosures, identifying limitations of current quantitative and sentiment-focused methods. It highlights the potential of unstructured text for financial health assessment and points towards broader future research directions.
Financial disclosure analysis and Knowledge extraction is an important financial analysis problem. Prevailing methods depend predominantly on quantitative ratios and techniques, which suffer from limitations like window dressing and past focus. Most of the information in a firm's financial disclosures is in unstructured text and contains valuable information about its health. Humans and machines fail to analyze it satisfactorily due to the enormous volume and unstructured nature, respectively. Researchers have started analyzing text content in disclosures recently. This paper covers the previous work in unstructured data analysis in Finance and Accounting. It also explores the state of art methods in computational linguistics and reviews the current methodologies in Natural Language Processing (NLP). Specifically, it focuses on research related to text source, linguistic attributes, firm attributes, and mathematical models employed in the text analysis approach. This work contributes to disclosure analysis methods by highlighting the limitations of the current focus on sentiment metrics and highlighting broader future research areas