CLMay 23, 2024

Data Augmentation Method Utilizing Template Sentences for Variable Definition Extraction

arXiv:2405.14962v12 citationsh-index: 4NLDB
Originality Incremental advance
AI Analysis

This addresses the costly need for field-specific training data in variable definition extraction, offering an incremental improvement for researchers and practitioners in scientific domains.

The study tackled the problem of extracting variable definitions from scientific papers, where performance varies across fields due to differences in definition characteristics, by proposing a data augmentation method using template sentences and variable-definition pairs, resulting in a model achieving 89.6% accuracy on chemical process papers.

The extraction of variable definitions from scientific and technical papers is essential for understanding these documents. However, the characteristics of variable definitions, such as the length and the words that make up the definition, differ among fields, which leads to differences in the performance of existing extraction methods across fields. Although preparing training data specific to each field can improve the performance of the methods, it is costly to create high-quality training data. To address this challenge, this study proposes a new method that generates new definition sentences from template sentences and variable-definition pairs in the training data. The proposed method has been tested on papers about chemical processes, and the results show that the model trained with the definition sentences generated by the proposed method achieved a higher accuracy of 89.6%, surpassing existing models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes