AI CL MAJun 25, 2025

Language Modeling by Language Models

Junyan Cheng, Peter Clark, Kyle Richardson

arXiv:2506.20249v118.18 citationsh-index: 26Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of automating architecture discovery in AI, offering a scalable method that could accelerate research, though it is incremental in applying existing techniques to a new domain.

The paper tackles the problem of discovering novel language model architectures by using a multi-agent LLM system called Genesys, which simulates research stages and employs a genetic programming backbone to efficiently generate and verify 1,162 new designs, with the best ones outperforming known architectures like GPT2 and Mamba2 on 6 out of 9 benchmarks.

Can we leverage LLMs to model the process of discovering novel language model (LM) architectures? Inspired by real research, we propose a multi-agent LLM approach that simulates the conventional stages of research, from ideation and literature search (proposal stage) to design implementation (code generation), generative pre-training, and downstream evaluation (verification). Using ideas from scaling laws, our system, Genesys, employs a Ladder of Scales approach; new designs are proposed, adversarially reviewed, implemented, and selectively verified at increasingly larger model scales (14M$\sim$350M parameters) with a narrowing budget (the number of models we can train at each scale). To help make discovery efficient and factorizable, Genesys uses a novel genetic programming backbone, which we show has empirical advantages over commonly used direct prompt generation workflows (e.g., $\sim$86\% percentage point improvement in successful design generation, a key bottleneck). We report experiments involving 1,162 newly discovered designs (1,062 fully verified through pre-training) and find the best designs to be highly competitive with known architectures (e.g., outperform GPT2, Mamba2, etc., on 6/9 common benchmarks). We couple these results with comprehensive system-level ablations and formal results, which give broader insights into the design of effective autonomous discovery systems.

View on arXiv PDF Code

Similar