Continuous Sweep for Binary Quantification Learning
This work addresses the need for accurate class prevalence estimation in supervised machine learning, offering an incremental improvement over existing quantification methods.
The authors tackled the problem of binary quantification, which estimates class prevalence in datasets, by introducing Continuous Sweep, a new parametric method that modifies Median Sweep. They showed that Continuous Sweep outperforms Classify, Count, and Correct quantifiers in simulations and is competitive with Distribution Matchers, with similar performance on an empirical dataset.
A quantifier is a supervised machine learning algorithm, focused on estimating the class prevalence in a dataset rather than labeling its individual observations. We introduce Continuous Sweep, a new parametric binary quantifier inspired by the well-performing Median Sweep, which is an ensemble method based on Adjusted Count estimators. We modified two aspects of Median Sweep: 1) using parametric class distributions instead of empirical distributions for the true and false positive rate; 2) using the mean instead of the median of a set of Adjusted Count estimates. These two modifications allow for a theoretical analysis of the bias and variance of Continuous Sweep. Furthermore, the expressions of bias and variance can be used to define optimal decision boundaries of the set of Adjusted count estimates to be used in the ensemble. We show in three simulation studies that Continuous Sweep outperforms the quantifiers in the group Classify, Count, and Correct, including Median Sweep, and is competitive with the two best quantifiers from the group Distribution Matchers. Also an empirical data set is analysed with these quantifiers showing similar performances.