Making Better Use of Unlabelled Data in Bayesian Active Learning
This work addresses the inefficiency in data acquisition for machine learning practitioners, though it appears incremental as it builds on existing Bayesian active learning and semi-supervised techniques.
The paper tackles the problem of Bayesian active learning by proposing a semi-supervised framework that leverages unlabelled data, resulting in better-performing models compared to conventional methods and easier scalability.
Fully supervised models are predominant in Bayesian active learning. We argue that their neglect of the information present in unlabelled data harms not just predictive performance but also decisions about what data to acquire. Our proposed solution is a simple framework for semi-supervised Bayesian active learning. We find it produces better-performing models than either conventional Bayesian active learning or semi-supervised learning with randomly acquired data. It is also easier to scale up than the conventional approach. As well as supporting a shift towards semi-supervised models, our findings highlight the importance of studying models and acquisition methods in conjunction.