LGAIFeb 18, 2025

Enhancing Semi-supervised Learning with Zero-shot Pseudolabels

arXiv:2502.12584v23 citationsh-index: 1Trans. Mach. Learn. Res.
Originality Incremental advance
AI Analysis

This work addresses the challenge of deploying machine learning in resource-constrained settings by enabling efficient training with foundation models, though it is incremental as it builds on existing SSL and zero-shot techniques.

The paper tackled the problem of high labeling costs in machine learning by proposing ZeroMatch, a semi-supervised learning framework that integrates knowledge distillation with consistency-based learning to leverage labeled data, unlabeled data, and pseudo-labels from foundation models, resulting in consistent performance improvements over standard methods across six benchmarks.

The high cost of data labeling presents a major barrier to deploying machine learning systems at scale. Semi-supervised learning (SSL) mitigates this challenge by utilizing unlabeled data alongside limited labeled examples, while the emergence of foundation models (FMs) offers powerful zero-shot capabilities that can further reduce labeling cost. However, directly fine-tuning large FMs is often impractical in resource-constrained settings, and naïvely using their pseudo-labels for unlabeled data can degrade performance due to its unreliablity or domain mismatch with target task. In this work, we introduce ZeroMatch, a novel SSL framework that integrates knowledge distillation with consistency-based learning to jointly leverage labeled data, unlabeled data, and pseudo-labels from FMs. ZeroMatch enables training compact student models using only FM inference, making it suitable for low-resource environments such as personal devices with limited compute. Experiments on six vision and language classification benchmarks show that ZeroMatch consistently outperforms standard SSL and zero-shot augmented methods, demonstrating its effectiveness and robustness across a range of foundation model qualities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes