CXLRAMSim v1.0: System-Level Exploration of CXL Memory Expander Cards
This addresses the problem of inaccurate simulation tools for CXL memory architectures, which is crucial for researchers and engineers optimizing server memory for AI workloads, though it appears incremental as it builds on existing simulation frameworks.
The paper tackles the challenge of accurately simulating CXL-based memory expander cards for scale-up systems in LLM training and inference by presenting CXLRAMSim, the first gem5-integrated, full-system simulator that models CXL devices correctly, enabling realistic latency-bandwidth behavior and true interleaving with system DRAM.
The growing demands in the training and inference of Large Language Models (LLMs) are accelerating the adoption of scale-up systems that extend server shared memory through the use of Compute Express Link (CXL)-based load/store interconnects. Accurate full-system simulation of such architectures remains challenging, as existing tools (all very recent) rely on simplified or non-compliant architectural models, impacting accuracy and usability. We present CXLRAMSim, the first gem5-integrated, full-system simulator that models CXL devices at their correct position on the I/O bus, enabling the use of unmodified Linux kernels and software stack, realistic latency-bandwidth behavior and true interleaving with system DRAM. Our approach provides high-fidelity CXL.mem characterization and captures key challenges such as cache pollution when accessing CXL memory.