Increasing Data Diversity with Iterative Sampling to Improve Performance
This addresses data quality issues for practitioners in data-centric AI, though it appears incremental as it builds on existing augmentation techniques.
The paper tackles the problem of limited training data diversity by proposing an iterative sampling method that focuses on augmenting difficult classes and edge cases, resulting in improved model performance as demonstrated in the Data-Centric AI Competition.
As a part of the Data-Centric AI Competition, we propose a data-centric approach to improve the diversity of the training samples by iterative sampling. The method itself relies strongly on the fidelity of augmented samples and the diversity of the augmentation methods. Moreover, we improve the performance further by introducing more samples for the difficult classes especially providing closer samples to edge cases potentially those the model at hand misclassifies.