LG CVSep 4, 2025

Data-Augmented Quantization-Aware Knowledge Distillation

arXiv:2509.03850v1h-index: 1

Originality Incremental advance

AI Analysis

This work addresses a specific bottleneck in creating efficient low-bit deep learning models for deployment in resource-constrained environments, offering an incremental but practical enhancement to existing quantization and distillation techniques.

The paper tackles the problem of selecting effective data augmentation strategies for quantization-aware knowledge distillation, particularly for low-precision models, by proposing a novel metric that evaluates augmentations based on contextual mutual information and class prediction alignment, resulting in significant improvements over state-of-the-art methods across various architectures and datasets.

Quantization-aware training (QAT) and Knowledge Distillation (KD) are combined to achieve competitive performance in creating low-bit deep learning models. Existing KD and QAT works focus on improving the accuracy of quantized models from the network output perspective by designing better KD loss functions or optimizing QAT's forward and backward propagation. However, limited attention has been given to understanding the impact of input transformations, such as data augmentation (DA). The relationship between quantization-aware KD and DA remains unexplored. In this paper, we address the question: how to select a good DA in quantization-aware KD, especially for the models with low precisions? We propose a novel metric which evaluates DAs according to their capacity to maximize the Contextual Mutual Information--the information not directly related to an image's label--while also ensuring the predictions for each class are close to the ground truth labels on average. The proposed method automatically ranks and selects DAs, requiring minimal training overhead, and it is compatible with any KD or QAT algorithm. Extensive evaluations demonstrate that selecting DA strategies using our metric significantly improves state-of-the-art QAT and KD works across various model architectures and datasets.

View on arXiv PDF

Similar