Predicting Distresses using Deep Learning of Text Segments in Annual Reports
This work addresses the problem of improving corporate distress prediction accuracy for financial analysts and regulators, representing an incremental advance by extending existing models with text analysis.
The paper tackles corporate distress prediction by incorporating unstructured textual data from annual reports alongside traditional numerical financial variables, finding that this approach provides statistically significant performance enhancement, particularly for large firms, with auditors' reports proving more informative than management statements.
Corporate distress models typically only employ the numerical financial variables in the firms' annual reports. We develop a model that employs the unstructured textual data in the reports as well, namely the auditors' reports and managements' statements. Our model consists of a convolutional recurrent neural network which, when concatenated with the numerical financial variables, learns a descriptive representation of the text that is suited for corporate distress prediction. We find that the unstructured data provides a statistically significant enhancement of the distress prediction performance, in particular for large firms where accurate predictions are of the utmost importance. Furthermore, we find that auditors' reports are more informative than managements' statements and that a joint model including both managements' statements and auditors' reports displays no enhancement relative to a model including only auditors' reports. Our model demonstrates a direct improvement over existing state-of-the-art models.