AICLMAJun 25, 2025

Language Modeling by Language Models

arXiv:2506.20249v18 citationsh-index: 26
AI Analysis

This work addresses the challenge of automating architecture discovery in AI, offering a scalable method that could accelerate research, though it is incremental in applying existing techniques to a new domain.

The paper tackles the problem of discovering novel language model architectures by using a multi-agent LLM system called Genesys, which simulates research stages and employs a genetic programming backbone to efficiently generate and verify 1,162 new designs, with the best ones outperforming known architectures like GPT2 and Mamba2 on 6 out of 9 benchmarks.

Can we leverage LLMs to model the process of discovering novel language model (LM) architectures? Inspired by real research, we propose a multi-agent LLM approach that simulates the conventional stages of research, from ideation and literature search (proposal stage) to design implementation (code generation), generative pre-training, and downstream evaluation (verification). Using ideas from scaling laws, our system, Genesys, employs a Ladder of Scales approach; new designs are proposed, adversarially reviewed, implemented, and selectively verified at increasingly larger model scales (14M$\sim$350M parameters) with a narrowing budget (the number of models we can train at each scale). To help make discovery efficient and factorizable, Genesys uses a novel genetic programming backbone, which we show has empirical advantages over commonly used direct prompt generation workflows (e.g., $\sim$86\% percentage point improvement in successful design generation, a key bottleneck). We report experiments involving 1,162 newly discovered designs (1,062 fully verified through pre-training) and find the best designs to be highly competitive with known architectures (e.g., outperform GPT2, Mamba2, etc., on 6/9 common benchmarks). We couple these results with comprehensive system-level ablations and formal results, which give broader insights into the design of effective autonomous discovery systems.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes