ASCLSDMLJun 12, 2024

Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation

arXiv:2406.07909v1
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in ASR systems, offering an incremental improvement for speech recognition applications.

The paper tackled the problem of frame-level alignment disagreement in knowledge distillation for automatic speech recognition, introducing a self-knowledge distillation method that improved resource efficiency and performance by reducing alignment issues.

Transformer encoder with connectionist temporal classification (CTC) framework is widely used for automatic speech recognition (ASR). However, knowledge distillation (KD) for ASR displays a problem of disagreement between teacher-student models in frame-level alignment which ultimately hinders it from improving the student model's performance. In order to resolve this problem, this paper introduces a self-knowledge distillation (SKD) method that guides the frame-level alignment during the training time. In contrast to the conventional method using separate teacher and student models, this study introduces a simple and effective method sharing encoder layers and applying the sub-model as the student model. Overall, our approach is effective in improving both the resource efficiency as well as performance. We also conducted an experimental analysis of the spike timings to illustrate that the proposed method improves performance by reducing the alignment disagreement.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes