Bridged Clustering for Representation Learning: Semi-Supervised Sparse Bridging
This addresses the challenge of label scarcity in semi-supervised learning for researchers and practitioners, offering an interpretable and efficient alternative to dense methods, though it is incremental in its approach.
The paper tackles the problem of learning predictors from unpaired input and output datasets by introducing Bridged Clustering, a semi-supervised framework that clusters inputs and outputs independently and learns a sparse bridge between them using few paired examples, achieving competitive performance with state-of-the-art methods while being label-efficient.
We introduce Bridged Clustering, a semi-supervised framework to learn predictors from any unpaired input $X$ and output $Y$ dataset. Our method first clusters $X$ and $Y$ independently, then learns a sparse, interpretable bridge between clusters using only a few paired examples. At inference, a new input $x$ is assigned to its nearest input cluster, and the centroid of the linked output cluster is returned as the prediction $\hat{y}$. Unlike traditional SSL, Bridged Clustering explicitly leverages output-only data, and unlike dense transport-based methods, it maintains a sparse and interpretable alignment. Through theoretical analysis, we show that with bounded mis-clustering and mis-bridging rates, our algorithm becomes an effective and efficient predictor. Empirically, our method is competitive with SOTA methods while remaining simple, model-agnostic, and highly label-efficient in low-supervision settings.