Deep Active Learning with Budget Annotation
This work addresses the challenge of expensive data labeling for machine learning practitioners, but it appears incremental as it builds on existing active learning methods by adding informativeness and leveraging pre-trained models.
The paper tackles the problem of high annotation costs for unlabeled data by proposing a hybrid active learning approach that combines uncertainty and informativeness measures, using pre-trained models to reduce labeling effort. Experiments on multiple datasets demonstrate the approach's efficacy, though no concrete numbers are provided.
Digital data collected over the decades and data currently being produced with use of information technology is vastly the unlabeled data or data without description. The unlabeled data is relatively easy to acquire but expensive to label even with use of domain experts. Most of the recent works focus on use of active learning with uncertainty metrics measure to address this problem. Although most uncertainty selection strategies are very effective, they fail to take informativeness of the unlabeled instances into account and are prone to querying outliers. In order to address these challenges we propose an hybrid approach of computing both the uncertainty and informativeness of an instance, then automaticaly label the computed instances using budget annotator. To reduce the annotation cost, we employ the state-of-the-art pre-trained models in order to avoid querying information already contained in those models. Our extensive experiments on different sets of datasets demonstrate the efficacy of the proposed approach.