Data Uncertainty without Prediction Models
This work addresses data efficiency for machine learning practitioners, but it is incremental as it builds on existing uncertainty-based active learning approaches.
The paper tackles the problem of costly data acquisition in machine learning by proposing a model-free uncertainty estimation method called Distance-weighted Class Impurity, which effectively reduces data needs in active learning tasks.
Data acquisition processes for machine learning are often costly. To construct a high-performance prediction model with fewer data, a degree of difficulty in prediction is often deployed as the acquisition function in adding a new data point. The degree of difficulty is referred to as uncertainty in prediction models. We propose an uncertainty estimation method named a Distance-weighted Class Impurity without explicit use of prediction models. We estimated uncertainty using distances and class impurities around the location, and compared it with several methods based on prediction models for uncertainty estimation by active learning tasks. We verified that the Distance-weighted Class Impurity works effectively regardless of prediction models.