ML LGJul 10, 2024

Using Low-Discrepancy Points for Data Compression in Machine Learning: An Experimental Comparison

Simone Göttlich, Jacob Heieck, Andreas Neuenkirch

arXiv:2407.07450v22 citationsh-index: 2

Originality Synthesis-oriented

AI Analysis

This work addresses data compression for machine learning practitioners, but it appears incremental as it builds on existing methods without claiming major breakthroughs.

The paper tackled the problem of reducing large datasets for neural network training by exploring two methods based on low-discrepancy points, comparing them to a K-means variant, and found that the methods were evaluated in terms of compression error and training accuracy, though no concrete numerical results were provided in the abstract.

Low-discrepancy points (also called Quasi-Monte Carlo points) are deterministically and cleverly chosen point sets in the unit cube, which provide an approximation of the uniform distribution. We explore two methods based on such low-discrepancy points to reduce large data sets in order to train neural networks. The first one is the method of Dick and Feischl [4], which relies on digital nets and an averaging procedure. Motivated by our experimental findings, we construct a second method, which again uses digital nets, but Voronoi clustering instead of averaging. Both methods are compared to the supercompress approach of [14], which is a variant of the K-means clustering algorithm. The comparison is done in terms of the compression error for different objective functions and the accuracy of the training of a neural network.

View on arXiv PDF

Similar