Esoteric Language Models
This work addresses efficiency and performance bottlenecks in language modeling for AI researchers and practitioners, representing a novel hybrid approach rather than an incremental improvement.
The paper tackles the problem of diffusion-based language models underperforming autoregressive models in perplexity and lacking inference efficiency features like KV caching, by introducing Eso-LMs, a fusion of autoregressive and masked diffusion models that sets a new state of the art on benchmarks and achieves up to 65x faster inference than standard masked diffusion models.
Diffusion-based language models offer a compelling alternative to autoregressive (AR) models by enabling parallel and controllable generation. Among this family of models, Masked Diffusion Models (MDMs) achieve the strongest performance but still underperform AR models in perplexity and lack key inference-time efficiency features--most notably, KV caching. In this work, we introduce Eso-LMs, a new family of models that fuses AR and MDM paradigms, enabling smooth interpolation between their perplexities while overcoming their respective limitations. Eso-LMs set a new state of the art on standard language modeling benchmarks. Crucially, we are the **first to introduce KV caching for MDMs** while preserving parallel generation, significantly improving inference efficiency. Combined with an optimized sampling schedule, our method achieves up to **65x** faster inference than standard MDMs and **4x** faster inference than prior semi-autoregressive approaches. We provide the code and model checkpoints on the project page: [http://s-sahoo.github.io/Eso-LMs](http://s-sahoo.github.io/Eso-LMs)