CLAILGOct 19, 2022

Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models

arXiv:2210.10289v2290 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of understanding model redundancy for NLP researchers, suggesting that current SOTA models are highly correlated and more diverse models are needed for advancement.

The paper tackles the lack of a theoretical framework for studying relationships between pre-trained language models by investigating their linear dependency, finding that BERT and 11 BERT-like models are 91% linearly dependent.

Pre-trained language models (LMs), such as BERT (Devlin et al., 2018) and its variants, have led to significant improvements on various NLP tasks in past years. However, a theoretical framework for studying their relationships is still missing. In this paper, we fill this gap by investigating the linear dependency between pre-trained LMs. The linear dependency of LMs is defined analogously to the linear dependency of vectors. We propose Language Model Decomposition (LMD) to represent a LM using a linear combination of other LMs as basis, and derive the closed-form solution. A goodness-of-fit metric for LMD similar to the coefficient of determination is defined and used to measure the linear dependency of a set of LMs. In experiments, we find that BERT and eleven (11) BERT-like LMs are 91% linearly dependent. This observation suggests that current state-of-the-art (SOTA) LMs are highly "correlated". To further advance SOTA we need more diverse and novel LMs that are less dependent on existing LMs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes