DARE: Data Augmented Relation Extraction with GPT-2
This work addresses data scarcity and imbalance issues in Relation Extraction, particularly for biomedical domains, by leveraging language models for data augmentation, representing an incremental advance over existing methods.
The paper tackles the challenge of limited training data and class imbalance in real-world Relation Extraction (RE) tasks by introducing DARE, a method that fine-tunes GPT-2 to generate examples for specific relation types and uses this augmented data with a BERT-based classifier, resulting in improvements of up to 11 F1 points against a baseline and achieving new state-of-the-art results with an average gain of 4.7 F1 points on three biomedical datasets.
Real-world Relation Extraction (RE) tasks are challenging to deal with, either due to limited training data or class imbalance issues. In this work, we present Data Augmented Relation Extraction(DARE), a simple method to augment training data by properly fine-tuning GPT-2 to generate examples for specific relation types. The generated training data is then used in combination with the gold dataset to train a BERT-based RE classifier. In a series of experiments we show the advantages of our method, which leads in improvements of up to 11 F1 score points against a strong base-line. Also, DARE achieves new state of the art in three widely used biomedical RE datasets surpassing the previous best results by 4.7 F1 points on average.