Progressive Identification of True Labels for Partial-Label Learning
This addresses scaling issues in weakly supervised learning for big data applications, though it is incremental as it builds on existing PLL methods.
The paper tackles the computational bottleneck in partial-label learning by proposing a flexible framework that includes a novel risk estimator and a progressive identification algorithm, achieving state-of-the-art results in experiments.
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label. Most existing methods elaborately designed learning objectives as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data. The goal of this paper is to propose a novel framework of PLL with flexibility on the model and optimization algorithm. More specifically, we propose a novel estimator of the classification risk, theoretically analyze the classifier-consistency, and establish an estimation error bound. Then we propose a progressive identification algorithm for approximately minimizing the proposed risk estimator, where the update of the model and identification of true labels are conducted in a seamless manner. The resulting algorithm is model-independent and loss-independent, and compatible with stochastic optimization. Thorough experiments demonstrate it sets the new state of the art.