A Graph-Theoretic Framework for Understanding Open-World Semi-Supervised Learning
It addresses the lack of theoretical foundations for open-world semi-supervised learning, which is important for applications requiring inference of both known and novel classes, though it appears incremental by building on existing graph methods.
The paper tackles the problem of open-world semi-supervised learning by formalizing a graph-theoretic framework, resulting in a provable error bound for clustering known and novel classes and empirical performance that matches or outperforms baselines on benchmark datasets.
Open-world semi-supervised learning aims at inferring both known and novel classes in unlabeled data, by harnessing prior knowledge from a labeled set with known classes. Despite its importance, there is a lack of theoretical foundations for this problem. This paper bridges the gap by formalizing a graph-theoretic framework tailored for the open-world setting, where the clustering can be theoretically characterized by graph factorization. Our graph-theoretic framework illuminates practical algorithms and provides guarantees. In particular, based on our graph formulation, we apply the algorithm called Spectral Open-world Representation Learning (SORL), and show that minimizing our loss is equivalent to performing spectral decomposition on the graph. Such equivalence allows us to derive a provable error bound on the clustering performance for both known and novel classes, and analyze rigorously when labeled data helps. Empirically, SORL can match or outperform several strong baselines on common benchmark datasets, which is appealing for practical usage while enjoying theoretical guarantees.