CLAILGOct 6, 2021

KNOT: Knowledge Distillation using Optimal Transport for Solving NLP Tasks

arXiv:2110.02432v2580 citationsHas Code
AI Analysis

This work addresses knowledge distillation for NLP tasks, offering an incremental improvement in semantic transfer.

The authors tackled the problem of distilling semantic knowledge from multiple teacher networks to a student network in NLP tasks, resulting in improved semantic distance performance over baselines while matching standard metrics like accuracy and F1.

We propose a new approach, Knowledge Distillation using Optimal Transport (KNOT), to distill the natural language semantic knowledge from multiple teacher networks to a student network. KNOT aims to train a (global) student model by learning to minimize the optimal transport cost of its assigned probability distribution over the labels to the weighted sum of probabilities predicted by the (local) teacher models, under the constraints, that the student model does not have access to teacher models' parameters or training data. To evaluate the quality of knowledge transfer, we introduce a new metric, Semantic Distance (SD), that measures semantic closeness between the predicted and ground truth label distributions. The proposed method shows improvements in the global model's SD performance over the baseline across three NLP tasks while performing on par with Entropy-based distillation on standard accuracy and F1 metrics. The implementation pertaining to this work is publicly available at: https://github.com/declare-lab/KNOT.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes