CLAICEJun 6, 2023

Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL Tagging

arXiv:2306.03723v117 citationsh-index: 43
Originality Incremental advance
AI Analysis

This addresses the need for automated XBRL tagging in financial reporting for public companies, though it is incremental as it builds on existing extreme classification methods.

The paper tackles the problem of automatically labeling numeric spans in financial statements with one of 2,794 labels, releasing the FNXL dataset and benchmarking two approaches: sequence labeling and a pipeline method, with the latter slightly outperforming for infrequent labels.

The U.S. Securities and Exchange Commission (SEC) mandates all public companies to file periodic financial statements that should contain numerals annotated with a particular label from a taxonomy. In this paper, we formulate the task of automating the assignment of a label to a particular numeral span in a sentence from an extremely large label set. Towards this task, we release a dataset, Financial Numeric Extreme Labelling (FNXL), annotated with 2,794 labels. We benchmark the performance of the FNXL dataset by formulating the task as (a) a sequence labelling problem and (b) a pipeline with span extraction followed by Extreme Classification. Although the two approaches perform comparably, the pipeline solution provides a slight edge for the least frequent labels.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes