CLFLJan 10, 2023

Memory Augmented Large Language Models are Computationally Universal

arXiv:2301.04589v160 citationsh-index: 54
AI Analysis

This addresses the computational limitations of language models for AI researchers, providing a foundational proof of their universal capabilities.

The paper demonstrates that transformer-based large language models, when augmented with an external read-write memory, become computationally universal, specifically showing that Flan-U-PaLM 540B can exactly simulate a universal Turing machine without modifying its weights.

We show that transformer-based large language models are computationally universal when augmented with an external memory. Any deterministic language model that conditions on strings of bounded length is equivalent to a finite automaton, hence computationally limited. However, augmenting such models with a read-write memory creates the possibility of processing arbitrarily large inputs and, potentially, simulating any algorithm. We establish that an existing large language model, Flan-U-PaLM 540B, can be combined with an associative read-write memory to exactly simulate the execution of a universal Turing machine, $U_{15,2}$. A key aspect of the finding is that it does not require any modification of the language model weights. Instead, the construction relies solely on designing a form of stored instruction computer that can subsequently be programmed with a specific set of prompts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes