A Novel Sequential Coreset Method for Gradient Descent Algorithms
This work addresses a central challenge in machine learning optimization by providing a more general and efficient coreset method, though it appears incremental as it builds on existing coreset techniques.
The paper tackles the problem of efficiently compressing large datasets for gradient descent optimization by introducing a sequential coreset framework that avoids reliance on pseudo-dimension and total sensitivity bounds, resulting in reduced computational complexity and poly-logarithmic coreset size for sparse optimization, with experimental results showing significant time savings compared to baseline algorithms.
A wide range of optimization problems arising in machine learning can be solved by gradient descent algorithms, and a central question in this area is how to efficiently compress a large-scale dataset so as to reduce the computational complexity. {\em Coreset} is a popular data compression technique that has been extensively studied before. However, most of existing coreset methods are problem-dependent and cannot be used as a general tool for a broader range of applications. A key obstacle is that they often rely on the pseudo-dimension and total sensitivity bound that can be very high or hard to obtain. In this paper, based on the ''locality'' property of gradient descent algorithms, we propose a new framework, termed ''sequential coreset'', which effectively avoids these obstacles. Moreover, our method is particularly suitable for sparse optimization whence the coreset size can be further reduced to be only poly-logarithmically dependent on the dimension. In practice, the experimental results suggest that our method can save a large amount of running time compared with the baseline algorithms.