CVMar 9, 2020

Pacemaker: Intermediate Teacher Knowledge Distillation For On-The-Fly Convolutional Neural Network

Wonchul Son, Youngbin Kim, Wonseok Song, Youngsu Moon, Wonjun Hwang

arXiv:2003.03944v11.2

Originality Incremental advance

AI Analysis

This addresses the need for efficient neural networks on resource-constrained devices like SoCs and embedded systems, though it is incremental as it builds on existing knowledge distillation techniques.

The paper tackles the problem of training convolutional neural networks for on-the-fly systems with low-performance hardware by proposing pacemaker knowledge distillation as an intermediate teacher, resulting in a 5.39% accuracy increase on CIFAR100 compared to conventional distillation and reduced training instability.

There is a need for an on-the-fly computational process with very low performance system such as system-on-chip (SoC) and embedded device etc. This paper presents pacemaker knowledge distillation as intermediate ensemble teacher to use convolutional neural network in these systems. For on-the-fly system, we consider student model using 1xN shape on-the-fly filter and teacher model using normal NxN shape filter. We note three points about training student model, caused by applying on-the-fly filter. First, same depth but unavoidable thin model compression. Second, the large capacity gap and parameter size gap due to only the horizontal field must be selected not the vertical receptive. Third, the performance instability and degradation of direct distilling. To solve these problems, we propose intermediate teacher, named pacemaker, for an on-the-fly student. So, student can be trained from pacemaker and original teacher step by step. Experiments prove our proposed method make significant performance (accuracy) improvements: on CIFAR100, 5.39% increased in WRN-40-4 than conventional knowledge distillation which shows even low performance than baseline. And we solve train instability, occurred when conventional knowledge distillation was applied without proposed method, by reducing deviation range by applying proposed method pacemaker knowledge distillation.

View on arXiv PDF

Similar