Magnus Själander

CR
4papers
3citations
Novelty65%
AI Score41

4 Papers

39.1ARMay 22
DAE4HLS: Exposing Memory-Level Parallelism for High-Level Synthesis using Explicit Decoupling

David Metz, Magnus Själander

High-level synthesis (HLS) performs well for simple memory access patterns, such as for sequential accesses that can be turned into bursts, or for memory accesses into small datasets that can be stored in scratchpads. This limits HLS to accelerating only the low-hanging fruit, where memory-level parallelism is either trivially abundant, due to simple access patterns, or latency is low, due to the small dataset. Applications with more complex access patterns on large datasets would also benefit from acceleration, and would especially benefit from the reduction in design and verification effort that HLS promises. In this paper, we present DAE4HLS, a decoupled access-execute (DAE) paradigm for HLS. We propose a new programming model for explicitly decoupling requests and responses, which unlocks memory-level parallelism that otherwise cannot be automatically provided by a compiler. We apply the DAE4HLS paradigm to the commercial AMD Vitis HLS toolchain and show that the existing AXI stream and AXI burst interfaces can be repurposed for explicit decoupling. We further apply the paradigm to a dynamic-HLS framework, which is better suited for handling irregular workloads as compared to statically scheduled HLS. We show that support for explicit decoupling improves the performance and achieves a total speedup of 10-79$\times$.

LGJun 24, 2024
CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization

Jacob O. Tørring, Carl Hvarfner, Luigi Nardi et al.

Bayesian optimization is a powerful method for automating tuning of compilers. The complex landscape of autotuning provides a myriad of rarely considered structural challenges for black-box optimizers, and the lack of standardized benchmarks has limited the study of Bayesian optimization within the domain. To address this, we present CATBench, a comprehensive benchmarking suite that captures the complexities of compiler autotuning, ranging from discrete, conditional, and permutation parameter types to known and unknown binary constraints, as well as both multi-fidelity and multi-objective evaluations. The benchmarks in CATBench span a range of machine learning-oriented computations, from tensor algebra to image processing and clustering, and uses state-of-the-art compilers, such as TACO and RISE/ELEVATE. CATBench offers a unified interface for evaluating Bayesian optimization algorithms, promoting reproducibility and innovation through an easy-to-use, fully containerized setup of both surrogate and real-world compiler optimization tasks. We validate CATBench on several state-of-the-art algorithms, revealing their strengths and weaknesses and demonstrating the suite's potential for advancing both Bayesian optimization and compiler autotuning research.

CRSep 22, 2021
"It's a Trap!"-How Speculation Invariance Can Be Abused with Forward Speculative Interference

Pavlos Aimoniotis, Christos Sakalis, Magnus Själander et al.

Speculative side-channel attacks access sensitive data and use transmitters to leak the data during wrong-path execution. Various defenses have been proposed to prevent such information leakage. However, not all speculatively executed instructions are unsafe: Recent work demonstrates that speculation invariant instructions are independent of speculative control-flow paths and are guaranteed to eventually commit, regardless of the speculation outcome. Compile-time information coupled with run-time mechanisms can then selectively lift defenses for speculation invariant instructions, reclaiming some of the lost performance. Unfortunately, speculation invariant instructions can easily be manipulated by a form of speculative interference to leak information via a new side-channel that we introduce in this paper. We show that forward speculative interference whereolder speculative instructions interfere with younger speculation invariant instructions effectively turns them into transmitters for secret data accessed during speculation. We demonstrate forward speculative interference on actual hardware, by selectively filling the reorder buffer (ROB) with instructions, pushing speculative invariant instructions in-or-out of the ROB on demand, based on a speculatively accessed secret. This reveals the speculatively accessed secret, as the occupancy of the ROB itself becomes a new speculative side-channel.

CRMar 19, 2021
Selectively Delaying Instructions to Prevent Microarchitectural Replay Attacks

Christos Sakalis, Stefanos Kaxiras, Magnus Själander

MicroScope, and microarchitectural replay attacks in general, take advantage of the characteristics of speculative execution to trap the execution of the victim application in an infinite loop, enabling the attacker to amplify a side-channel attack by executing it indefinitely. Due to the nature of the replay, it can be used to effectively attack security critical trusted execution environments (secure enclaves), even under conditions where a side-channel attack would not be possible. At the same time, unlike speculative side-channel attacks, MicroScope can be used to amplify the correct path of execution, rendering many existing speculative side-channel defences ineffective. In this work, we generalize microarchitectural replay attacks beyond MicroScope and present an efficient defence against them. We make the observation that such attacks rely on repeated squashes of so-called "replay handles" and that the instructions causing the side-channel must reside in the same reorder buffer window as the handles. We propose Delay-on-Squash, a technique for tracking squashed instructions and preventing them from being replayed by speculative replay handles. Our evaluation shows that it is possible to achieve full security against microarchitectural replay attacks with very modest hardware requirements, while still maintaining 97% of the insecure baseline performance.