LG MLMar 18, 2024

The Power of Few: Accelerating and Enhancing Data Reweighting with Coreset Selection

Mohammad Jafari, Yimeng Zhang, Yihua Zhang, Sijia Liu

arXiv:2403.12166v36.44 citationsh-index: 20ICASSP

Originality Incremental advance

AI Analysis

This work addresses the problem of unsustainable computational costs in ML for practitioners, though it appears incremental as it builds on existing coreset and reweighting techniques.

The paper tackles the challenge of balancing computational efficiency and model accuracy in machine learning by introducing a coreset selection method for data reweighting, which reduces computational time while maintaining performance.

As machine learning tasks continue to evolve, the trend has been to gather larger datasets and train increasingly larger models. While this has led to advancements in accuracy, it has also escalated computational costs to unsustainable levels. Addressing this, our work aims to strike a delicate balance between computational efficiency and model accuracy, a persisting challenge in the field. We introduce a novel method that employs core subset selection for reweighting, effectively optimizing both computational time and model performance. By focusing on a strategically selected coreset, our approach offers a robust representation, as it efficiently minimizes the influence of outliers. The re-calibrated weights are then mapped back to and propagated across the entire dataset. Our experimental results substantiate the effectiveness of this approach, underscoring its potential as a scalable and precise solution for model training.

View on arXiv PDF

Similar