PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation
This work addresses the challenge of efficient DNN deployment on edge devices, offering an incremental improvement by integrating existing compression techniques in a novel way.
The authors tackled the problem of deploying deep neural networks on edge devices with limited computational resources by proposing PQK, a model compression method combining pruning, quantization, and knowledge distillation, which achieved competitive results on keyword spotting and image recognition tasks without requiring a pre-trained teacher model.
As edge devices become prevalent, deploying Deep Neural Networks (DNN) on edge devices has become a critical issue. However, DNN requires a high computational resource which is rarely available for edge devices. To handle this, we propose a novel model compression method for the devices with limited computational resources, called PQK consisting of pruning, quantization, and knowledge distillation (KD) processes. Unlike traditional pruning and KD, PQK makes use of unimportant weights pruned in the pruning process to make a teacher network for training a better student network without pre-training the teacher model. PQK has two phases. Phase 1 exploits iterative pruning and quantization-aware training to make a lightweight and power-efficient model. In phase 2, we make a teacher network by adding unimportant weights unused in phase 1 to a pruned network. By using this teacher network, we train the pruned network as a student network. In doing so, we do not need a pre-trained teacher network for the KD framework because the teacher and the student networks coexist within the same network. We apply our method to the recognition model and verify the effectiveness of PQK on keyword spotting (KWS) and image recognition.