LG AI CVJan 18, 2023

Enhancing Self-Training Methods

Aswathnarayan Radhakrishnan, Jim Davis, Zachary Rabin, Benjamin Lewis, Matthew Scherreik, Roman Ilin

arXiv:2301.07294v15.32 citationsh-index: 13

Originality Incremental advance

AI Analysis

This work addresses a key bottleneck in semi-supervised learning for practitioners using self-training, though it appears incremental in nature.

The paper tackled the problem of confirmation bias in self-training methods for semi-supervised learning, which causes performance saturation, and proposed enhancements that showed performance gains over existing designs across multiple datasets.

Semi-supervised learning approaches train on small sets of labeled data along with large sets of unlabeled data. Self-training is a semi-supervised teacher-student approach that often suffers from the problem of "confirmation bias" that occurs when the student model repeatedly overfits to incorrect pseudo-labels given by the teacher model for the unlabeled data. This bias impedes improvements in pseudo-label accuracy across self-training iterations, leading to unwanted saturation in model performance after just a few iterations. In this work, we describe multiple enhancements to improve the self-training pipeline to mitigate the effect of confirmation bias. We evaluate our enhancements over multiple datasets showing performance gains over existing self-training design choices. Finally, we also study the extendability of our enhanced approach to Open Set unlabeled data (containing classes not seen in labeled data).

View on arXiv PDF

Similar