InstanT: Semi-supervised Learning with Instance-dependent Thresholds
This addresses a fundamental challenge in semi-supervised learning for researchers and practitioners by introducing a novel thresholding approach that is more flexible than existing methods.
The paper tackles the problem of selecting confident unlabeled instances in semi-supervised learning by proposing instance-dependent thresholds, which utilize instance-level ambiguity and error rates to assign higher thresholds to instances more likely to have incorrect pseudo-labels, resulting in a bounded probabilistic guarantee for pseudo-label correctness.
Semi-supervised learning (SSL) has been a fundamental challenge in machine learning for decades. The primary family of SSL algorithms, known as pseudo-labeling, involves assigning pseudo-labels to confident unlabeled instances and incorporating them into the training set. Therefore, the selection criteria of confident instances are crucial to the success of SSL. Recently, there has been growing interest in the development of SSL methods that use dynamic or adaptive thresholds. Yet, these methods typically apply the same threshold to all samples, or use class-dependent thresholds for instances belonging to a certain class, while neglecting instance-level information. In this paper, we propose the study of instance-dependent thresholds, which has the highest degree of freedom compared with existing methods. Specifically, we devise a novel instance-dependent threshold function for all unlabeled instances by utilizing their instance-level ambiguity and the instance-dependent error rates of pseudo-labels, so instances that are more likely to have incorrect pseudo-labels will have higher thresholds. Furthermore, we demonstrate that our instance-dependent threshold function provides a bounded probabilistic guarantee for the correctness of the pseudo-labels it assigns.