ARApr 20

ChipLight: Cross-Layer Optimization of Chiplet Design with Optical Interconnects for LLM Training

arXiv:2604.1890984.4h-index: 3
AI Analysis

This work addresses the communication bottleneck in large-scale LLM training by co-optimizing chiplet architecture, training parallel strategy, and optical interconnect topology.

ChipLight presents a cross-layer optimization method for chiplet design with optical interconnects, achieving significantly improved training efficiency for large-scale LLM training clusters.

In large-scale distributed LLM training, communication between devices becomes the key performance bottleneck. Chiplet technology can integrate multiple dies into a package to scale-up node performance with higher bandwidth. Meanwhile, optical interconnect (OI) technology offers long-reach, high-bandwidth links, making it well suited for scale-out networks. The combination of these two technologies has the potential to overcome communication bottlenecks within and across packages. In this work, we present ChipLight, a cross-layer multi-objective design and optimization method for training clusters leveraging chiplet and OI. We first abstract an architecture model for such complex clusters, co-optimizing chiplet architecture, training parallel strategy, and OI network topology. Based on such models, we tailor the design space exploration flow by combining both black-box and white-box methodologies. Evaluated by our experimental results, ChipLight achieves significantly improved training efficiency and provides valuable design insights for the development of future training clusters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes