CLApr 5, 2024

Assessing the quality of information extraction

Filip Seitl, Tomáš Kovářík, Soheyla Mirshahi, Jan Kryštůfek, Rastislav Dujava, Matúš Ondreička, Herbert Ullrich, Petr Gronat

arXiv:2404.04068v22.77 citationsh-index: 3

Originality Synthesis-oriented

AI Analysis

This addresses the need for reliable quality assessment in information extraction applications, though it appears incremental as it builds on existing LLM capabilities.

The paper tackles the problem of objectively measuring the quality of information extraction from unstructured data using large language models, introducing an automatic framework with scores to evaluate extraction quality and completeness.

Advances in large language models have notably enhanced the efficiency of information extraction from unstructured and semi-structured data sources. As these technologies become integral to various applications, establishing an objective measure for the quality of information extraction becomes imperative. However, the scarcity of labeled data presents significant challenges to this endeavor. In this paper, we introduce an automatic framework to assess the quality of the information extraction/retrieval and its completeness. The framework focuses on information extraction in the form of entity and its properties. We discuss how to handle the input/output size limitations of the large language models and analyze their performance when extracting the information. In particular, we introduce scores to evaluate the quality of the extraction and provide an extensive discussion on how to interpret them.

View on arXiv PDF

Similar