Generative Design for Direct-to-Chip Liquid Cooling for Data Centers

arXiv:2604.1094174.01 citationsh-index: 1

AI Analysis

For data center thermal management, this framework addresses the need for optimized cooling of heterogeneous packages with non-uniform temperature distributions, enabling more efficient and sustainable AI computing.

This work presents a generative design framework for optimizing cooling channel geometries in direct-to-chip liquid cooling for data centers. Applied to the NVIDIA GB200 Grace Blackwell Superchip, the method achieves over 5°C reduction in average temperature and over 35°C reduction in maximum temperature compared to a baseline parallel channel design.

Rapid growth in artificial intelligence (AI) workloads is driving up data center power densities, increasing the need for advanced thermal management. Direct-to-chip liquid cooling can remove heat efficiently at the source, but many cold plate channel layouts remain heuristic and are not optimized for the strongly non-uniform temperature distribution of modern heterogeneous packages. This work presents a generative design framework for synthesizing cooling channel geometries for the NVIDIA GB200 Grace Blackwell Superchip. A physics-based finite-difference thermal model provides rapid steady-state temperature predictions and supplies spatial thermal feedback to a constrained reaction-diffusion process that generates novel channel topologies while enforcing inlet/outlet and component constraints. By iterating channel generation and thermal evaluation in a closed loop, the method naturally redistributes cooling capacity toward high-power regions and suppresses hot-spot formation. Compared with a baseline parallel channel design, the resulting channels achieve more than a 5 degree Celsius reduction in average temperature and over 35 degree Celsius reduction in maximum temperature. Overall, the results demonstrate that coupling generative algorithms with lightweight physics-based modeling can significantly enhance direct-to-chip liquid cooling performance, supporting more sustainable scaling of AI computing.

View on arXiv PDF

Similar