CVAINov 28, 2025

Adaptive Dataset Quantization: A New Direction for Dataset Pruning

arXiv:2512.05987v1
Originality Incremental advance
AI Analysis

This addresses storage and communication challenges for resource-constrained edge devices, representing an incremental advance by focusing on intra-sample redundancy rather than inter-sample methods.

The paper tackles the problem of high storage and communication costs for large-scale datasets on edge devices by proposing a dataset quantization method that reduces intra-sample redundancy, achieving significant dataset compression while maintaining model training performance on benchmarks like CIFAR-10, CIFAR-100, and ImageNet-1K.

This paper addresses the challenges of storage and communication costs for large-scale datasets in resource-constrained edge devices by proposing a novel dataset quantization approach to reduce intra-sample redundancy. Unlike traditional dataset pruning and distillation methods that focus on inter-sample redundancy, the proposed method compresses each image by reducing redundant or less informative content within samples while preserving essential features. It first applies linear symmetric quantization to obtain an initial quantization range and scale for each sample. Then, an adaptive quantization allocation algorithm is introduced to distribute different quantization ratios for samples with varying precision requirements, maintaining a constant total compression ratio. The main contributions include: (1) being the first to use limited bits to represent datasets for storage reduction; (2) introducing a dataset-level quantization algorithm with adaptive ratio allocation; and (3) validating the method's effectiveness through extensive experiments on CIFAR-10, CIFAR-100, and ImageNet-1K. Results show that the method maintains model training performance while achieving significant dataset compression, outperforming traditional quantization and dataset pruning baselines under the same compression ratios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes