CLAIFeb 21, 2025

HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings

arXiv:2502.15411v21 citationsh-index: 20
AI Analysis

This addresses the challenge of label transferability in financial reporting for companies and regulators, but it is incremental as it builds on existing iXBRL standards.

The authors tackled the problem of extracting hierarchical key performance indicators from unstructured financial text by introducing the HiFi-KPI dataset, which includes ~1.8M paragraphs and ~5M entities linked to iXBRL taxonomies, and they provided baselines using encoder-based approaches and LLMs.

The U.S. Securities and Exchange Commission (SEC) requires that public companies file financial reports tagging numbers with the machine readable inline eXtensible Business Reporting Language (iXBRL) standard. However, the highly complex and highly granular taxonomy defined by iXBRL limits label transferability across domains. In this paper, we introduce the Hierarchical Financial Key Performance Indicator (HiFi-KPI) dataset, designed to facilitate numerical KPI extraction at specified levels of granularity from unstructured financial text. Our approach organizes a 218,126-label hierarchy using a taxonomy based grouping method, investigating which taxonomy layer provides the most meaningful structure. HiFi-KPI comprises ~1.8M paragraphs and ~5M entities, each linked to a label in the iXBRL-specific calculation and presentation taxonomies. We provide baselines using encoder-based approaches and structured extraction using Large Language Models (LLMs). To simplify LLM inference and evaluation, we additionally release HiFi-KPI Lite, a manually curated subset with four expert-mapped labels. We publicly release all artifacts.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes