Towards Self-Adaptive Pseudo-Label Filtering for Semi-Supervised Learning
This addresses a bottleneck in semi-supervised learning for practitioners by automating filtering to enhance model accuracy, though it is incremental as it builds on existing SSL methods.
The paper tackles the problem of hand-crafted pseudo-label filtering in semi-supervised learning, which discards correct labels and selects incorrect ones, by proposing a self-adaptive filter that models confidence distributions to improve performance, achieving gains especially with scarce labeled data.
Recent semi-supervised learning (SSL) methods typically include a filtering strategy to improve the quality of pseudo labels. However, these filtering strategies are usually hand-crafted and do not change as the model is updated, resulting in a lot of correct pseudo labels being discarded and incorrect pseudo labels being selected during the training process. In this work, we observe that the distribution gap between the confidence values of correct and incorrect pseudo labels emerges at the very beginning of the training, which can be utilized to filter pseudo labels. Based on this observation, we propose a Self-Adaptive Pseudo-Label Filter (SPF), which automatically filters noise in pseudo labels in accordance with model evolvement by modeling the confidence distribution throughout the training process. Specifically, with an online mixture model, we weight each pseudo-labeled sample by the posterior of it being correct, which takes into consideration the confidence distribution at that time. Unlike previous handcrafted filters, our SPF evolves together with the deep neural network without manual tuning. Extensive experiments demonstrate that incorporating SPF into the existing SSL methods can help improve the performance of SSL, especially when the labeled data is extremely scarce.