CVSep 20, 2024

Data Pruning via Separability, Integrity, and Model Uncertainty-Aware Importance Sampling

Steven Grosz, Rui Zhao, Rajeev Ranjan, Hongcheng Wang, Manoj Aggarwal, Gerard Medioni, Anil Jain

arXiv:2409.13915v13.71 citationsh-index: 3

Originality Incremental advance

AI Analysis

This work addresses efficient dataset reduction for image classification, particularly in fine-grained scenarios, though it appears incremental over existing data pruning methods.

The paper tackles data pruning for image classification by introducing a new pruning metric and procedure based on importance sampling, which accounts for data separability, integrity, and model uncertainty; experiments on four benchmark datasets show it scales well to high pruning ratios and generalizes across models.

This paper improves upon existing data pruning methods for image classification by introducing a novel pruning metric and pruning procedure based on importance sampling. The proposed pruning metric explicitly accounts for data separability, data integrity, and model uncertainty, while the sampling procedure is adaptive to the pruning ratio and considers both intra-class and inter-class separation to further enhance the effectiveness of pruning. Furthermore, the sampling method can readily be applied to other pruning metrics to improve their performance. Overall, the proposed approach scales well to high pruning ratio and generalizes better across different classification models, as demonstrated by experiments on four benchmark datasets, including the fine-grained classification scenario.

View on arXiv PDF

Similar