LG CLJul 8, 2025

Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate

arXiv:2507.07129v2h-index: 1

Originality Highly original

AI Analysis

This offers a more resource-efficient and modular approach to scaling AI systems, potentially benefiting researchers and developers in machine learning.

The paper tackles the resource-intensive and inflexible nature of monolithic training for large language models by proposing a constructive scaling paradigm, where models are grown incrementally on a frozen foundation, and demonstrates that this approach rivals the performance of a monolithically trained baseline of the same size.

The prevailing paradigm for scaling large language models (LLMs) involves monolithic, end-to-end training, a resource-intensive process that lacks flexibility. This paper explores an alternative, constructive scaling paradigm, enabled by the principle of emergent semantics in Transformers with frozen, non-semantic input embeddings. We posit that because high-level meaning is a compositional property of a Transformer's deep layers, not its input vectors, the embedding layer and trained lower layers can serve as a fixed foundation. This liberates backpropagation to focus solely on newly added components, making incremental growth viable. We operationalize this with a layer-wise constructive methodology that combines strict layer freezing in early stages with efficient, holistic fine-tuning of the entire model stack via low-rank adaptation (LoRA) as complexity increases. This method not only demonstrates stable convergence but also reveals a direct correlation between model depth and the emergence of complex reasoning abilities, such as those required for SQuAD, which are absent in shallower models. In a controlled study, our constructively grown model rivals the performance of a monolithically trained baseline of the same size, validating the efficiency and efficacy of the approach. Our findings suggest a path towards a paradigm shift from monolithic optimization towards a more biological or constructive model of AI development. This opens a path for more resource-efficient scaling, continual learning, and a more modular approach to building powerful AI systems. We release all code and models to facilitate further research.

View on arXiv PDF

Similar