Uncovering Coresets for Classification With Multi-Objective Evolutionary Algorithms
This work addresses improving training speed and interpretability in machine learning, but it is incremental as it builds on existing coreset methods.
The paper tackles the problem of coreset discovery for classification by proposing a multi-objective evolutionary algorithm to optimize trade-offs between subset size and classification error, resulting in lower error and better generalization than state-of-the-art techniques on non-trivial benchmarks.
A coreset is a subset of the training set, using which a machine learning algorithm obtains performances similar to what it would deliver if trained over the whole original data. Coreset discovery is an active and open line of research as it allows improving training speed for the algorithms and may help human understanding the results. Building on previous works, a novel approach is presented: candidate corsets are iteratively optimized, adding and removing samples. As there is an obvious trade-off between limiting training size and quality of the results, a multi-objective evolutionary algorithm is used to minimize simultaneously the number of points in the set and the classification error. Experimental results on non-trivial benchmarks show that the proposed approach is able to deliver results that allow a classifier to obtain lower error and better ability of generalizing on unseen data than state-of-the-art coreset discovery techniques.