LLM-based Triplet Extraction from Financial Reports
This addresses the challenge of knowledge graph construction for financial analysts by providing a semi-automated pipeline, though it is incremental as it builds on existing LLM and ontology methods.
The paper tackled the problem of evaluating triplet extraction from financial reports without annotated ground truth by using ontology-driven proxy metrics, resulting in 100% schema conformance with an automated ontology and reducing subject hallucination rates from 65.2% to 1.6% through a hybrid verification strategy.
Corporate financial reports are a valuable source of structured knowledge for Knowledge Graph construction, but the lack of annotated ground truth in this domain makes evaluation difficult. We present a semi-automated pipeline for Subject-Predicate-Object triplet extraction that uses ontology-driven proxy metrics, specifically Ontology Conformance and Faithfulness, instead of ground-truth-based evaluation. We compare a static, manually engineered ontology against a fully automated, document-specific ontology induction approach across different LLMs and two corporate annual reports. The automatically induced ontology achieves 100% schema conformance in all configurations, eliminating the ontology drift observed with the manual approach. We also propose a hybrid verification strategy that combines regex matching with an LLM-as-a-judge check, reducing apparent subject hallucination rates from 65.2% to 1.6% by filtering false positives caused by coreference resolution. Finally, we identify a systematic asymmetry between subject and object hallucinations, which we attribute to passive constructions and omitted agents in financial prose.