CLLGOct 11, 2022

From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models

Peking UTencent
arXiv:2210.05230v1291 citationsh-index: 65
Originality Highly original
AI Analysis

This addresses the need to reduce computational costs and environmental impacts in AI by enabling efficient reuse of existing PLMs, though it is incremental in advancing model reuse techniques.

The paper tackles the problem of reusing pre-trained language models (PLMs) without human annotations by proposing Knowledge Integration (KI), a paradigm to merge knowledge from multiple teacher PLMs into a versatile student model, achieving substantial improvements on benchmark datasets.

Investigating better ways to reuse the released pre-trained language models (PLMs) can significantly reduce the computational cost and the potential environmental side-effects. This paper explores a novel PLM reuse paradigm, Knowledge Integration (KI). Without human annotations available, KI aims to merge the knowledge from different teacher-PLMs, each of which specializes in a different classification problem, into a versatile student model. To achieve this, we first derive the correlation between virtual golden supervision and teacher predictions. We then design a Model Uncertainty--aware Knowledge Integration (MUKI) framework to recover the golden supervision for the student. Specifically, MUKI adopts Monte-Carlo Dropout to estimate model uncertainty for the supervision integration. An instance-wise re-weighting mechanism based on the margin of uncertainty scores is further incorporated, to deal with the potential conflicting supervision from teachers. Experimental results demonstrate that MUKI achieves substantial improvements over baselines on benchmark datasets. Further analysis shows that MUKI can generalize well for merging teacher models with heterogeneous architectures, and even teachers major in cross-lingual datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes