SESep 5, 2021

Semi-Automated Labeling of Requirement Datasets for Relation Extraction

Jeremias Bohn, Jannik Fischbach, Martin Schmitt, Hinrich Schütze, Andreas Vogelsang

arXiv:2109.02050v161.6654 citations

Originality Synthesis-oriented

AI Analysis

This work addresses dataset creation challenges for researchers and practitioners in requirements engineering, but it is incremental as it builds on existing labeling methods.

The authors tackled the labor-intensive and biased manual labeling of datasets for relation extraction by proposing a semi-automatic framework and providing a preprocessed dataset from requirements engineering, showing a substantial overlap between human and automatic labels.

Creating datasets manually by human annotators is a laborious task that can lead to biased and inhomogeneous labels. We propose a flexible, semi-automatic framework for labeling data for relation extraction. Furthermore, we provide a dataset of preprocessed sentences from the requirements engineering domain, including a set of automatically created as well as hand-crafted labels. In our case study, we compare the human and automatic labels and show that there is a substantial overlap between both annotations.

View on arXiv PDF

Similar