Unbiased Estimations based on Binary Classifiers: A Maximum Likelihood Approach
This addresses a common issue in machine learning for practitioners dealing with imbalanced or shifting data distributions, offering a practical solution for unbiased estimation.
The paper tackles the problem of bias in binary classifiers when applied to datasets with different proportions of positive items, proposing a maximum likelihood estimator for the true proportion of positives and testing it on synthetic and real-world data, showing it provides accurate estimates without prior distribution knowledge.
Binary classifiers trained on a certain proportion of positive items introduce a bias when applied to data sets with different proportions of positive items. Most solutions for dealing with this issue assume that some information on the latter distribution is known. However, this is not always the case, certainly when this proportion is the target variable. In this paper a maximum likelihood estimator for the true proportion of positives in data sets is suggested and tested on synthetic and real world data.