Boost Picking: A Universal Method on Converting Supervised Classification to Semi-supervised Classification
This addresses the challenge of data labeling costs in machine learning, offering a semi-supervised approach that could reduce reliance on labeled datasets, though it appears incremental in applying weak classifiers to a known bottleneck.
The paper tackles the problem of training supervised classification models with limited labeled data by proposing Boost Picking, a universal method that uses two weak classifiers to estimate and correct errors, enabling effective training with unlabeled data as if using 100% labeled data under specific conditions.
This paper proposes a universal method, Boost Picking, to train supervised classification models mainly by un-labeled data. Boost Picking only adopts two weak classifiers to estimate and correct the error. It is theoretically proved that Boost Picking could train a supervised model mainly by un-labeled data as effectively as the same model trained by 100% labeled data, only if recalls of the two weak classifiers are all greater than zero and the sum of precisions is greater than one. Based on Boost Picking, we present "Test along with Training (TawT)" to improve the generalization of supervised models. Both Boost Picking and TawT are successfully tested in varied little data sets.