Bayesian active learning for production, a systematic study and a reusable library
This work addresses practical challenges in active learning for production environments, making it incremental by building on existing methods to improve applicability.
The paper tackles the gap between theoretical active learning techniques and real-world constraints by analyzing drawbacks like model convergence, annotation error, and dataset imbalance, and presents approaches including partial uncertainty sampling and larger query sizes to speed up the active learning loop, along with an open-source library BaaL.
Active learning is able to reduce the amount of labelling effort by using a machine learning model to query the user for specific inputs. While there are many papers on new active learning techniques, these techniques rarely satisfy the constraints of a real-world project. In this paper, we analyse the main drawbacks of current active learning techniques and we present approaches to alleviate them. We do a systematic study on the effects of the most common issues of real-world datasets on the deep active learning process: model convergence, annotation error, and dataset imbalance. We derive two techniques that can speed up the active learning loop such as partial uncertainty sampling and larger query size. Finally, we present our open-source Bayesian active learning library, BaaL.