LG AIMar 9, 2023

Provable Data Subset Selection For Efficient Neural Network Training

Murad Tukan, Samson Zhou, Alaa Maalouf, Daniela Rus, Vladimir Braverman, Dan Feldman

MIT

arXiv:2303.05151v113.016 citationsh-index: 33Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of reducing computational costs in training deep neural networks for researchers and practitioners, though it is incremental as it builds on existing coreset theory for a specific network type.

The paper tackles the problem of efficient neural network training by introducing the first algorithm to construct coresets for radial basis function neural networks (RBFNNs), which are small weighted subsets that approximate the loss and gradients of the full dataset, enabling provable data subset selection and demonstrating efficacy in empirical evaluations.

Radial basis function neural networks (\emph{RBFNN}) are {well-known} for their capability to approximate any continuous function on a closed bounded set with arbitrary precision given enough hidden neurons. In this paper, we introduce the first algorithm to construct coresets for \emph{RBFNNs}, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network and thus approximate any function defined by an \emph{RBFNN} on the larger input data. In particular, we construct coresets for radial basis and Laplacian loss functions. We then use our coresets to obtain a provable data subset selection algorithm for training deep neural networks. Since our coresets approximate every function, they also approximate the gradient of each weight in a neural network, which is a particular function on the input. We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets, demonstrating the efficacy and accuracy of our coreset construction.

View on arXiv PDF Code

Similar