Supervising Unsupervised Learning
This provides a principled way to evaluate and apply unsupervised algorithms across domains, though it is incremental in leveraging existing supervised data.
The paper tackles the subjectivity in unsupervised learning by reducing it to supervised learning using knowledge from heterogeneous supervised datasets, resulting in improved performance across hundreds of problems with simple algorithms and provable bounds that circumvent Kleinberg's impossibility result in clustering.
We introduce a framework to leverage knowledge acquired from a repository of (heterogeneous) supervised datasets to new unsupervised datasets. Our perspective avoids the subjectivity inherent in unsupervised learning by reducing it to supervised learning, and provides a principled way to evaluate unsupervised algorithms. We demonstrate the versatility of our framework via simple agnostic bounds on unsupervised problems. In the context of clustering, our approach helps choose the number of clusters and the clustering algorithm, remove the outliers, and provably circumvent the Kleinberg's impossibility result. Experimental results across hundreds of problems demonstrate improved performance on unsupervised data with simple algorithms, despite the fact that our problems come from heterogeneous domains. Additionally, our framework lets us leverage deep networks to learn common features from many such small datasets, and perform zero shot learning.