AR AI DC ET PFJun 17, 2025

Scaling Intelligence: Designing Data Centers for Next-Gen Language Models

Jesmin Jahan Tithi, Hanjiang Wu, Avishaii Abuhatzera, Fabrizio Petrini

arXiv:2506.15006v34.36 citationsh-index: 9

Originality Incremental advance

AI Analysis

This work addresses the critical problem of inefficient data center design for AI researchers and engineers, offering incremental improvements through a comprehensive co-design approach.

The paper tackles the challenge of scaling data center architectures for large language models (LLMs) by introducing a co-design framework that evaluates network topologies and system parameters, demonstrating that FullFlat optical networks improve performance and scalability, with analytical modeling achieving within 10% accuracy of real-world measurements.

The explosive growth of Large Language Models (LLMs), such as GPT-4 with 1.8 trillion parameters, demands a fundamental rethinking of data center architecture to ensure scalability, efficiency, and cost-effectiveness. Our work provides a comprehensive co-design framework that jointly explores FLOPS, HBM bandwidth and capacity, multiple network topologies (two-tier vs. FullFlat optical), the size of the scale-out domain, and popular parallelism/optimization strategies used in LLMs. We introduce and evaluate FullFlat network architectures, which provide uniform high-bandwidth, low-latency connectivity between all nodes, and demonstrate their transformative impact on performance and scalability. Through detailed sensitivity analyses, we quantify the benefits of overlapping compute and communication, leveraging hardware-accelerated collectives, widening the scale-out domain, and increasing memory capacity. Our study spans both sparse (mixture of experts) and dense transformer-based LLMs, revealing how system design choices affect Model FLOPS Utilization (MFU = Model FLOPS per token * Observed tokens per second / Peak FLOPS of the hardware) and overall throughput. For the co-design study, we utilized an analytical performance modeling tool capable of predicting LLM runtime within 10% of real-world measurements. Our findings offer actionable insights and a practical roadmap for designing AI data centers that can efficiently support trillion-parameter models, reduce optimization complexity, and sustain the rapid evolution of AI capabilities.

View on arXiv PDF

Similar