CLAILGJul 1, 2024

$\text{Memory}^3$: Language Modeling with Explicit Memory

arXiv:2407.01178v140 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses the efficiency problem for AI researchers and practitioners by reducing computational costs, though it appears incremental as it builds on existing memory concepts in LLMs.

The authors tackled the high cost of training and inference in large language models by introducing explicit memory, a cheaper alternative to model parameters and retrieval-augmented generation, resulting in a 2.4B model that outperforms larger models and RAG while maintaining faster decoding speed.

The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowledge externalized to explicit memories, the LLM can enjoy a smaller parameter size, training cost, and inference cost, all proportional to the amount of remaining "abstract knowledge". As a preliminary proof of concept, we train from scratch a 2.4B LLM, which achieves better performance than much larger LLMs as well as RAG models, and maintains higher decoding speed than RAG. The model is named $\text{Memory}^3$, since explicit memory is the third form of memory in LLMs after implicit memory (model parameters) and working memory (context key-values). We introduce a memory circuitry theory to support the externalization of knowledge, and present novel techniques including a memory sparsification mechanism that makes storage tractable and a two-stage pretraining scheme that facilitates memory formation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes