CL IR LGMay 24, 2024

Adapting PromptORE for Modern History: Information Extraction from Hispanic Monarchy Documents of the XVIth Century

Hèctor Loopez Hidalgo, Michel Boeglin, David Kahn, Josiane Mothe, Diego Ortiz, David Panzoli

arXiv:2406.00027v11.0h-index: 3Has Code

Originality Incremental advance

AI Analysis

This work addresses information extraction challenges for historians and archivists working with non-English historical texts, though it is incremental as it adapts an existing method to a specialized domain.

The researchers tackled the problem of extracting semantic relations from historical Spanish documents, specifically trials from the Spanish Inquisition, by adapting PromptORE with a fine-tuning approach called 'biasing' to handle complex entity placements and genderism. Their method achieved up to a 50% improvement in accuracy compared to baseline PromptORE models.

Semantic relations among entities are a widely accepted method for relation extraction. PromptORE (Prompt-based Open Relation Extraction) was designed to improve relation extraction with Large Language Models on generalistic documents. However, it is less effective when applied to historical documents, in languages other than English. In this study, we introduce an adaptation of PromptORE to extract relations from specialized documents, namely digital transcripts of trials from the Spanish Inquisition. Our approach involves fine-tuning transformer models with their pretraining objective on the data they will perform inference. We refer to this process as "biasing". Our Biased PromptORE addresses complex entity placements and genderism that occur in Spanish texts. We solve these issues by prompt engineering. We evaluate our method using Encoder-like models, corroborating our findings with experts' assessments. Additionally, we evaluate the performance using a binomial classification benchmark. Our results show a substantial improvement in accuracy -up to a 50% improvement with our Biased PromptORE models in comparison to the baseline models using standard PromptORE.

View on arXiv PDF Code

Similar