CL AISep 26, 2023

Fine-tuning and aligning question answering models for complex information extraction tasks

Matthias Engelbach, Dennis Klau, Felix Scheerer, Jens Drawehn, Maximilien Kintz

arXiv:2309.14805v12.57 citationsh-index: 7

Originality Incremental advance

AI Analysis

This work addresses the need for more reliable document analysis in business environments, such as insurance and medical sectors, by leveraging extractive models to reduce hallucinations, though it is incremental as it builds on existing QA methods.

The authors tackled the problem of unreliable information extraction from business documents by using and fine-tuning extractive question answering models, achieving improved performance for complex linguistic features like damage cause explanations and medication descriptions with only a small annotated dataset.

The emergence of Large Language Models (LLMs) has boosted performance and possibilities in various NLP tasks. While the usage of generative AI models like ChatGPT opens up new opportunities for several business use cases, their current tendency to hallucinate fake content strongly limits their applicability to document analysis, such as information retrieval from documents. In contrast, extractive language models like question answering (QA) or passage retrieval models guarantee query results to be found within the boundaries of an according context document, which makes them candidates for more reliable information extraction in productive environments of companies. In this work we propose an approach that uses and integrates extractive QA models for improved feature extraction of German business documents such as insurance reports or medical leaflets into a document analysis solution. We further show that fine-tuning existing German QA models boosts performance for tailored extraction tasks of complex linguistic features like damage cause explanations or descriptions of medication appearance, even with using only a small set of annotated data. Finally, we discuss the relevance of scoring metrics for evaluating information extraction tasks and deduce a combined metric from Levenshtein distance, F1-Score, Exact Match and ROUGE-L to mimic the assessment criteria from human experts.

View on arXiv PDF

Similar