LGCVMLJul 18, 2018

Self-supervised Knowledge Distillation Using Singular Value Decomposition

arXiv:1807.06819v1151 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the computational and data efficiency issues in deep learning for practitioners needing smaller models, though it appears incremental as it builds on existing distillation methods.

The paper tackles the problem of insufficient knowledge transfer in teacher-student deep neural networks by proposing a new knowledge distillation method using singular value decomposition and framing it as a self-supervised task. The result is a student network with 1/5 the computational cost of the teacher that achieves up to 1.1% better classification accuracy, and it outperforms state-of-the-art distillation by 1.79% under the same cost.

To solve deep neural network (DNN)'s huge training dataset and its high computation issue, so-called teacher-student (T-S) DNN which transfers the knowledge of T-DNN to S-DNN has been proposed. However, the existing T-S-DNN has limited range of use, and the knowledge of T-DNN is insufficiently transferred to S-DNN. To improve the quality of the transferred knowledge from T-DNN, we propose a new knowledge distillation using singular value decomposition (SVD). In addition, we define a knowledge transfer as a self-supervised task and suggest a way to continuously receive information from T-DNN. Simulation results show that a S-DNN with a computational cost of 1/5 of the T-DNN can be up to 1.1\% better than the T-DNN in terms of classification accuracy. Also assuming the same computational cost, our S-DNN outperforms the S-DNN driven by the state-of-the-art distillation with a performance advantage of 1.79\%. code is available on https://github.com/sseung0703/SSKD\_SVD.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes