Stefanos Kaxiras

h-index31

4papers

4,078citations

Novelty68%

AI Score37

Ranked #92,349 of 194,257 authors (top 48%)#2,270 in CR (top 34%)

4 Papers

3.1ARJul 9

Who Needs DRAM? We Have Fiber

Hannah Atmer, Thiemo Voigt, Yuan Yao et al.

The rising pressure on DRAM availability and contract pricing reflects generative AI's massive high-performance memory requirements. This pressure is heavily compounded by hyperscale data center expansion, which now consumes a significant portion of global DRAM output. In this work, we propose a new architecture: Fiber Memory, which reimagines the role of optical fiber in a hyperscale data center, deploying it as an active, recirculating delay-line memory for immutable data, such as large language model (LLM) weights. We present a data-parallel optical broadcast delay-line memory architecture that accounts for fiber's physical realities. By incorporating space-division multiplexed multi-core fibers (MCFs), passive optical tap-and-amplify interfaces, co-packaged optics (CPO), and regional all-optical regeneration, our case study evaluation demonstrates that Fiber Memory can eliminate redundant weight storage across 10,000 AI accelerators and reduce weight-delivery energy by over 70% compared to traditional HBM3e configurations.

1.2ARDec 26, 2025

Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling

Hannah Atmer, Yuan Yao, Thiemo Voigt et al.

Energy consumption dictates the cost and environmental impact of deploying Large Language Models. This paper investigates the impact of on-chip SRAM size and operating frequency on the energy efficiency and performance of LLM inference, focusing on the distinct behaviors of the compute-bound prefill and memory-bound decode phases. Our simulation methodology combines OpenRAM for energy modeling, LLMCompass for latency simulation, and ScaleSIM for systolic array operational intensity. Our findings show that total energy use is predominantly determined by SRAM size in both phases, with larger buffers significantly increasing static energy due to leakage, which is not offset by corresponding latency benefits. We quantitatively explore the memory-bandwidth bottleneck, demonstrating that while high operating frequencies reduce prefill latency, their positive impact on memory-bound decode latency is capped by the external memory bandwidth. Counter-intuitively, high compute frequency can reduce total energy by reducing execution time and consequently decreasing static energy consumption more than the resulting dynamic power increase. We identify an optimal hardware configuration for the simulated workload: high operating frequencies (1200MHz-1400MHz) and a small local buffer size of 32KB to 64KB. This combination achieves the best energy-delay product, balancing low latency with high energy efficiency. Furthermore, we demonstrate how memory bandwidth acts as a performance ceiling, and that increasing compute frequency only yields performance gains up to the point where the workload becomes memory-bound. This analysis provides concrete architectural insights for designing energy-efficient LLM accelerators, especially for datacenters aiming to minimize their energy overhead.

3.8CRSep 22, 2021

"It's a Trap!"-How Speculation Invariance Can Be Abused with Forward Speculative Interference

Pavlos Aimoniotis, Christos Sakalis, Magnus Själander et al.

Speculative side-channel attacks access sensitive data and use transmitters to leak the data during wrong-path execution. Various defenses have been proposed to prevent such information leakage. However, not all speculatively executed instructions are unsafe: Recent work demonstrates that speculation invariant instructions are independent of speculative control-flow paths and are guaranteed to eventually commit, regardless of the speculation outcome. Compile-time information coupled with run-time mechanisms can then selectively lift defenses for speculation invariant instructions, reclaiming some of the lost performance. Unfortunately, speculation invariant instructions can easily be manipulated by a form of speculative interference to leak information via a new side-channel that we introduce in this paper. We show that forward speculative interference whereolder speculative instructions interfere with younger speculation invariant instructions effectively turns them into transmitters for secret data accessed during speculation. We demonstrate forward speculative interference on actual hardware, by selectively filling the reorder buffer (ROB) with instructions, pushing speculative invariant instructions in-or-out of the ROB on demand, based on a speculatively accessed secret. This reveals the speculatively accessed secret, as the occupancy of the ROB itself becomes a new speculative side-channel.

3.8CRMar 19, 2021

Selectively Delaying Instructions to Prevent Microarchitectural Replay Attacks

Christos Sakalis, Stefanos Kaxiras, Magnus Själander

MicroScope, and microarchitectural replay attacks in general, take advantage of the characteristics of speculative execution to trap the execution of the victim application in an infinite loop, enabling the attacker to amplify a side-channel attack by executing it indefinitely. Due to the nature of the replay, it can be used to effectively attack security critical trusted execution environments (secure enclaves), even under conditions where a side-channel attack would not be possible. At the same time, unlike speculative side-channel attacks, MicroScope can be used to amplify the correct path of execution, rendering many existing speculative side-channel defences ineffective. In this work, we generalize microarchitectural replay attacks beyond MicroScope and present an efficient defence against them. We make the observation that such attacks rely on repeated squashes of so-called "replay handles" and that the instructions causing the side-channel must reside in the same reorder buffer window as the handles. We propose Delay-on-Squash, a technique for tracking squashed instructions and preventing them from being replayed by speculative replay handles. Our evaluation shows that it is possible to achieve full security against microarchitectural replay attacks with very modest hardware requirements, while still maintaining 97% of the insecure baseline performance.