CLJun 17, 2024

How Good are LLMs at Relation Extraction under Low-Resource Scenario? Comprehensive Evaluation

arXiv:2406.11162v210 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of relation extraction for low-resource languages, which is incremental as it extends existing datasets and methods without introducing new techniques.

The paper tackled the problem of poor relation extraction performance in low-resource languages by constructing datasets in 10 such languages and evaluating open-source LLMs on them, finding that these methods still underperform due to data scarcity issues.

Relation Extraction (RE) serves as a crucial technology for transforming unstructured text into structured information, especially within the framework of Knowledge Graph development. Its importance is emphasized by its essential role in various downstream tasks. Besides the conventional RE methods which are based on neural networks and pre-trained language models, large language models (LLMs) are also utilized in the research field of RE. However, on low-resource languages (LRLs), both conventional RE methods and LLM-based methods perform poorly on RE due to the data scarcity issues. To this end, this paper constructs low-resource relation extraction datasets in 10 LRLs in three regions (Central Asia, Southeast Asia and Middle East). The corpora are constructed by translating the original publicly available English RE datasets (NYT10, FewRel and CrossRE) using an effective multilingual machine translation. Then, we use the language perplexity (PPL) to filter out the low-quality data from the translated datasets. Finally, we conduct an empirical study and validate the performance of several open-source LLMs on these generated LRL RE datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes