Towards Fundamental Language Models: Does Linguistic Competence Scale with Model Size?
This addresses efficiency and interpretability issues in NLP by advocating for modular, tool-augmented systems, though it is incremental as it builds on existing ideas of model decomposition.
The paper tackles the problem of large language models' limitations by proposing the Fundamental Language Model (FLM) paradigm, which separates linguistic competence from factual memorization, and finds that model size is more closely tied to memorization than core language ability, with internal factual knowledge growing significantly faster than linguistic competence across models from 135M to 32B parameters.
Large Language Models offer impressive language capabilities but suffer from well-known limitations, including hallucinations, biases, privacy concerns, and high computational costs. These issues are largely driven by the combination of linguistic competence and factual memorization within a single monolithic model. This paper introduces and empirically supports the Fundamental Language Model (FLM) paradigm, which advocates for smaller, linguistically competent models that offload factual retrieval to external tools. We evaluate models ranging from 135M to 32B parameters across three dimensions: linguistic competence, external factual knowledge, and internal factual knowledge. Our findings reveal that while both linguistic competence and factual knowledge improve with scale, internal factual knowledge grows significantly faster, suggesting that model size is more closely tied to memorization than to core language ability. These results support a modular approach to language modeling, where compact, linguistically proficient models serve as the foundation for tool-augmented systems. The FLM paradigm offers a path toward more efficient, interpretable, and sustainable NLP solutions.