CLAILGNov 3, 2022

Overcoming Barriers to Skill Injection in Language Modeling: Case Study in Arithmetic

arXiv:2211.02098v16 citationsh-index: 18
Originality Highly original
AI Analysis

This addresses the limitation of language models in handling numeric comprehension and arithmetic reasoning, which is crucial for tasks requiring strict mathematical logic, though it is incremental in improving model versatility.

The paper tackles the problem of enabling large pre-trained language models to perform mathematical reasoning without losing their linguistic abilities, achieving this through a novel framework that prevents catastrophic forgetting during skill injection.

Through their transfer learning abilities, highly-parameterized large pre-trained language models have dominated the NLP landscape for a multitude of downstream language tasks. Though linguistically proficient, the inability of these models to incorporate the learning of non-linguistic entities (numerals and arithmetic reasoning) limits their usage for tasks that require numeric comprehension or strict mathematical reasoning. However, as we illustrate in this paper, building a general purpose language model that also happens to be proficient in mathematical reasoning is not as straight-forward as training it on a numeric dataset. In this work, we develop a novel framework that enables language models to be mathematically proficient while retaining their linguistic prowess. Specifically, we offer information-theoretic interventions to overcome the catastrophic forgetting of linguistic skills that occurs while injecting non-linguistic skills into language models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes