Data as Voters: Core Set Selection Using Approval-Based Multi-Winner Voting
This work addresses the problem of reducing training data size for machine learning practitioners, offering an incremental improvement over existing core set selection methods.
The paper tackles the core set selection problem in machine learning by proposing a method based on approval-based multi-winner voting, where data instances act as both voters and candidates. The approach improves performance over state-of-the-art methods in experiments with neural networks, KNN, and SVM, with statistically significant differences in several cases.
We present a novel approach to the core set/instance selection problem in machine learning. Our approach is based on recent results on (proportional) representation in approval-based multi-winner elections. In our model, instances play a double role as voters and candidates. The approval set of each instance in the training set (acting as a voter) is defined from the concept of local set, which already exists in the literature. We then select the election winners by using a representative voting rule, and such winners are the data instances kept in the reduced training set. We evaluate our approach in two experiments involving neural network classifiers and classic machine learning classifiers (KNN and SVM). Our experiments show that, in several cases, our approach improves the performance of state-of-the-art methods, and the differences are statistically significant.