Form 10-Q Itemization
This addresses a domain-specific problem for financial analysts and regulators by providing a tool to automate the extraction of machine-readable data from quarterly reports, though it is incremental as it builds on existing methods like CNNs and rule-based systems.
The paper tackles the challenge of retrieving item-specific information from unstructured Form 10-Q financial filings by developing a solution that combines a rule-based algorithm with a Convolutional Neural Network image classifier, enabling rapid data extraction from large volumes of textual data.
The quarterly financial statement, or Form 10-Q, is one of the most frequently required filings for US public companies to disclose financial and other important business information. Due to the massive volume of 10-Q filings and the enormous variations in the reporting format, it has been a long-standing challenge to retrieve item-specific information from 10-Q filings that lack machine-readable hierarchy. This paper presents a solution for itemizing 10-Q files by complementing a rule-based algorithm with a Convolutional Neural Network (CNN) image classifier. This solution demonstrates a pipeline that can be generalized to a rapid data retrieval solution among a large volume of textual data using only typographic items. The extracted textual data can be used as unlabeled content-specific data to train transformer models (e.g., BERT) or fit into various field-focus natural language processing (NLP) applications.