Punching Above Precision: Small Quantized Model Distillation with Learnable Regularizer
This work addresses the problem of deploying efficient AI models on resource-constrained hardware, offering an incremental improvement over existing QAT-KD methods.
The paper tackles the challenge of balancing task-specific and distillation losses in quantization-aware training with knowledge distillation for small quantized models, proposing a learnable regularization method (GoR) that improves performance and convergence, achieving state-of-the-art results in image classification, object detection, and LLM compression while maintaining full-precision accuracy on edge devices.
Quantization-aware training (QAT) combined with knowledge distillation (KD) is a promising strategy for compressing Artificial Intelligence (AI) models for deployment on resource-constrained hardware. However, existing QAT-KD methods often struggle to balance task-specific (TS) and distillation losses due to heterogeneous gradient magnitudes, especially under low-bit quantization. We propose Game of Regularizer (GoR), a novel learnable regularization method that adaptively balances TS and KD objectives using only two trainable parameters for dynamic loss weighting. GoR reduces conflict between supervision signals, improves convergence, and boosts the performance of small quantized models (SQMs). Experiments on image classification, object detection (OD), and large language model (LLM) compression show that GoR consistently outperforms state-of-the-art QAT-KD methods. On low-power edge devices, it delivers faster inference while maintaining full-precision accuracy. We also introduce QAT-EKD-GoR, an ensemble distillation framework that uses multiple heterogeneous teacher models. Under optimal conditions, the proposed EKD-GoR can outperform full-precision models, providing a robust solution for real-world deployment.