9.7CRMay 27
HammerSim: A System-Level Tool to Model RowHammerKaustav Goswami, Ayaz Akram, Hari Venugopalan et al.
Modern architecture research relies on simulators to evaluate system security, yet analyzing emerging hardware vulnerabilities like RowHammer requires full-system visibility. As RowHammer vulnerabilities worsen with continuous technology scaling, existing simulators lack the system-level models needed to study complex OS effects and cross-layer mitigations. This tool deficiency leaves modern computing platforms exposed to severe reliability and security risks. In this work, we present HammerSim, a gem5-based framework for modeling RowHammer at the full-system level. HammerSim integrates probability-driven bitflip modeling to realistically capture the behavior of RowHammer. It further enables evaluation of hardware and software mitigations such as TRR and selective ECC. We validate HammerSim's bitflip modeling against real DDR4 DIMMs using JS divergence, demonstrating its utility in studying attacks, defenses, and benign workload susceptibility. Our framework provides an extensible platform to bridge the gap between hardware experiments and architectural simulation.
12.3ARMay 28
Space-Control: Process-Level Isolation for Sharing CXL-based Disaggregated MemoryKaustav Goswami, Sean Peisert, Venkatesh Akella et al.
Memory disaggregation via CXL enables multi-host resource sharing. However, existing CXL sharing mechanisms enforce coarse-grained, host-level permissions only, leaving isolation to the operating system. Today, virtual memory enables process-level isolation on a host and CXL enables host-level isolation. This creates a critical security gap: the absence of process-level memory isolation in shared disaggregated memory. We present Space-Control, an architectural abstraction that introduces a cross-host identity primitive to enforce confidentiality and integrity. We decouple authorization from the untrusted OS using a hardware-rooted validation engine (SPACE) to establish immutable process identity and a Permission Checker at the memory egress point for fine-grained permission validation. Our design supports 127 concurrent processes across 255 hosts with only 1.56% storage overhead. Cycle-level evaluation using gem5 + SST shows that Space-Control incurs a minimal 3.3% performance penalty with a modest 16 KiB cache, providing a practical and scalable foundation for secure, process-level memory disaggregation.
49.6ARMay 26
CXL-ClusterSim: Modeling CXL-based Disaggregated Memory Cluster for Pooling and Sharing using gem5 and SSTKaustav Goswami, Maryam Babaie, Hoa Nguyen et al.
Large-scale AI training and inference require hundreds of gigabytes to terabytes of DRAM with high peak to average utilization ratios, resulting in overprovisioning. In cloud computing, DRAM constitutes a significant share of the cost. Yet, as shown by recent articles, DRAM is heavily under utilized. Memory disaggregation is a solution to both these problems. With the advent of the CXL protocol, there is renewed interest in designing and optimizing computing systems with disaggregated memory. However, at present, there are limited simulation tools available for exploring the design space and evaluating the performance tradeoffs in computer systems with disaggregated memory. In this paper, we propose CXL-ClusterSim, a full-system modeling and simulation framework by combining the gem5 simulator for fidelity, with the Structural Simulation Toolkit (SST) for parallel simulation. We outline the challenges in creating this simulation infrastructure and present a design that is scalable, flexible, and reasonably fast to help computer architects to explore the design space of CXL-based disaggregated memory and identify new opportunities for hardware/software codesign and performance optimization.
71.2ARMar 20Code
Toward Reproducible and Standardized Computer Architecture Simulation with gem5Kunal Pai, Harshil Patel, Erin Le et al.
Reproducibility in simulation-based computer architecture research requires coordinating artifacts like disk images, kernels, and benchmarks, but existing workflows are inconsistent. We improve gem5, an open-source simulator with over 1600 forks, and gem5 Resources, a centralized repository of over 2000 pre-packaged artifacts, to address these issues. While gem5 Resources enables artifact sharing, researchers still face challenges. Creating custom disk images is complex and time-consuming, with no standardized process across ISAs, making it difficult to extend and share images. gem5 provides limited guest-host communication features through a set of predefined exit events that restrict researchers' ability to dynamically control and monitor simulations. Lastly, running simulations with multiple workloads requires researchers to write custom external scripts to coordinate multiple gem5 simulations which creates error-prone and hard-to-reproduce workflows. To overcome this, we introduce several features in gem5 and gem5 Resources. We standardize disk-image creation across x86, ARM, and RISC-V using Packer, and provide validated base images with pre-annotated benchmark suites (NPB, GAPBS). We provide 12 new disk images, 6 new kernels, and over 200 workloads across three ISAs. We refactor the exit event system to a class-based model and introduce hypercalls for enhanced guest-host communication that allows researchers to define custom behavior for their exit events. We also provide a utility to remotely monitor simulations and the gem5-bridge driver for user-space m5 operations. Additionally, we implemented Suites and MultiSim to enable parallel full-system simulations from gem5 configuration scripts, eliminating the need for external scripting. These features reduce setup complexity and provide extensible, validated resources that improve reproducibility and standardization.
LGDec 7, 2020
The Tribes of Machine Learning and the Realm of Computer ArchitectureAyaz Akram, Jason Lowe-Power
Machine learning techniques have influenced the field of computer architecture like many other fields. This paper studies how the fundamental machine learning techniques can be applied towards computer architecture problems. We also provide a detailed survey of computer architecture research that employs different machine learning methods. Finally, we present some future opportunities and the outstanding challenges that need to be overcome to exploit full potential of machine learning for computer architecture.
DCOct 25, 2020
Performance Analysis of Scientific Computing Workloads on Trusted Execution EnvironmentsAyaz Akram, Anna Giannakou, Venkatesh Akella et al.
Scientific computing sometimes involves computation on sensitive data. Depending on the data and the execution environment, the HPC (high-performance computing) user or data provider may require confidentiality and/or integrity guarantees. To study the applicability of hardware-based trusted execution environments (TEEs) to enable secure scientific computing, we deeply analyze the performance impact of AMD SEV and Intel SGX for diverse HPC benchmarks including traditional scientific computing, machine learning, graph analytics, and emerging scientific computing workloads. We observe three main findings: 1) SEV requires careful memory placement on large scale NUMA machines (1$\times$$-$3.4$\times$ slowdown without and 1$\times$$-$1.15$\times$ slowdown with NUMA aware placement), 2) virtualization$-$a prerequisite for SEV$-$results in performance degradation for workloads with irregular memory accesses and large working sets (1$\times$$-$4$\times$ slowdown compared to native execution for graph applications) and 3) SGX is inappropriate for HPC given its limited secure memory size and inflexible programming model (1.2$\times$$-$126$\times$ slowdown over unsecure execution). Finally, we discuss forthcoming new TEE designs and their potential impact on scientific computing.