RealMix: Towards Realistic Semi-Supervised Deep Learning Algorithms
It addresses a practical limitation in semi-supervised learning for real-world applications where data distributions may not align, though it appears incremental as it builds on prior SSL methods.
The paper tackles the problem of semi-supervised learning algorithms performing poorly when labeled and unlabeled data distributions differ, and introduces RealMix, which achieves state-of-the-art results, including a 9.79% error rate on CIFAR10 with 250 labels, and is the only method tested to surpass baseline performance under significant distribution mismatch.
Semi-Supervised Learning (SSL) algorithms have shown great potential in training regimes when access to labeled data is scarce but access to unlabeled data is plentiful. However, our experiments illustrate several shortcomings that prior SSL algorithms suffer from. In particular, poor performance when unlabeled and labeled data distributions differ. To address these observations, we develop RealMix, which achieves state-of-the-art results on standard benchmark datasets across different labeled and unlabeled set sizes while overcoming the aforementioned challenges. Notably, RealMix achieves an error rate of 9.79% on CIFAR10 with 250 labels and is the only SSL method tested able to surpass baseline performance when there is significant mismatch in the labeled and unlabeled data distributions. RealMix demonstrates how SSL can be used in real world situations with limited access to both data and compute and guides further research in SSL with practical applicability in mind.