LG AS IVNov 5, 2021

Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models

Ji Won Yoon, Hyung Yong Kim, Hyeonseung Lee, Sunghwan Ahn, Nam Soo Kim

arXiv:2111.03664v41.6

Originality Incremental advance

AI Analysis

This is an incremental improvement for researchers and practitioners in sequence learning tasks like speech and text recognition, focusing on enhancing knowledge distillation efficiency.

The paper tackles the problem of improving knowledge distillation for CTC-based sequence models by introducing an Oracle Teacher that uses both source inputs and output labels as input, leading to more accurate CTC alignments and better student performance, with experiments on speech and scene text recognition showing improved results and faster teacher training.

Knowledge distillation (KD), best known as an effective method for model compression, aims at transferring the knowledge of a bigger network (teacher) to a much smaller network (student). Conventional KD methods usually employ the teacher model trained in a supervised manner, where output labels are treated only as targets. Extending this supervised scheme further, we introduce a new type of teacher model for connectionist temporal classification (CTC)-based sequence models, namely Oracle Teacher, that leverages both the source inputs and the output labels as the teacher model's input. Since the Oracle Teacher learns a more accurate CTC alignment by referring to the target information, it can provide the student with more optimal guidance. One potential risk for the proposed approach is a trivial solution that the model's output directly copies the target input. Based on a many-to-one mapping property of the CTC algorithm, we present a training strategy that can effectively prevent the trivial solution and thus enables utilizing both source and target inputs for model training. Extensive experiments are conducted on two sequence learning tasks: speech recognition and scene text recognition. From the experimental results, we empirically show that the proposed model improves the students across these tasks while achieving a considerable speed-up in the teacher model's training time.

View on arXiv PDF

Similar