CLAILGMay 14, 2023

Learning Non-linguistic Skills without Sacrificing Linguistic Proficiency

arXiv:2305.08246v1224 citations
Originality Incremental advance
AI Analysis

This addresses a critical issue for Math-NLP practitioners by enabling LLMs to learn new skills without losing core linguistic capabilities, though it is incremental in improving existing methods.

The paper tackles the problem of catastrophic forgetting in LLMs when injecting non-linguistic skills like arithmetic reasoning, and presents a novel framework that outperforms state-of-the-art models in both skill acquisition and linguistic retention using only 1/4 of the non-linguistic training data and no additional synthetic linguistic data.

The field of Math-NLP has witnessed significant growth in recent years, motivated by the desire to expand LLM performance to the learning of non-linguistic notions (numerals, and subsequently, arithmetic reasoning). However, non-linguistic skill injection typically comes at a cost for LLMs: it leads to catastrophic forgetting of core linguistic skills, a consequence that often remains unaddressed in the literature. As Math-NLP has been able to create LLMs that can closely approximate the mathematical skills of a grade-schooler or the arithmetic reasoning skills of a calculator, the practicality of these models fail if they concomitantly shed their linguistic capabilities. In this work, we take a closer look into the phenomena of catastrophic forgetting as it pertains to LLMs and subsequently offer a novel framework for non-linguistic skill injection for LLMs based on information theoretic interventions and skill-specific losses that enable the learning of strict arithmetic reasoning. Our model outperforms the state-of-the-art both on injected non-linguistic skills and on linguistic knowledge retention, and does so with a fraction of the non-linguistic training data (1/4) and zero additional synthetic linguistic training data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes