An Approach to Reducing Annotation Costs for BioNLP
This work addresses annotation cost reduction for BioNLP researchers, but it appears incremental as it builds on a previously developed algorithm without introducing new methods.
The paper tackles the problem of high annotation costs in BioNLP tasks by applying an active learning algorithm called ClosestInitPA, which is designed to reduce these costs effectively for tasks with specific characteristics like redundancy and imbalanced datasets.
There is a broad range of BioNLP tasks for which active learning (AL) can significantly reduce annotation costs and a specific AL algorithm we have developed is particularly effective in reducing annotation costs for these tasks. We have previously developed an AL algorithm called ClosestInitPA that works best with tasks that have the following characteristics: redundancy in training material, burdensome annotation costs, Support Vector Machines (SVMs) work well for the task, and imbalanced datasets (i.e. when set up as a binary classification problem, one class is substantially rarer than the other). Many BioNLP tasks have these characteristics and thus our AL algorithm is a natural approach to apply to BioNLP tasks.