Peterson Yuhala

h-index4

3papers

17citations

Novelty50%

AI Score48

Ranked #26,858 of 194,257 authors (top 14%)#525 in CR (top 8%)

3 Papers

6.4CRApr 30Code

DRAMatic Speedup: Accelerating HE Operations on a Processing-in-Memory System

Niklas Klinger, Jonas Sander, Peterson Yuhala et al.

Homomorphic encryption (HE) is a promising technology for confidential cloud computing, as it allows computations on encrypted data. However, HE is computationally expensive and often memory-bound on conventional computer architectures. Processing-in-Memory (PIM) is an alternative hardware architecture that integrates processing units and memory on the same chip or memory module. PIM enables higher memory bandwidth than conventional architectures and could thus be suitable for accelerating HE. We present DRAMatic, which implements operations foundational to HE on UPMEM PIM -- a programmable general-purpose PIM system developed by UPMEM. DRAMatic incorporates many arithmetic optimizations, including residue number system and number-theoretic transform techniques, and can support the large parameters required for secure homomorphic evaluations. It achieves a 334 times speed-up compared to previous HE implementations on UPMEM PIM. We also evaluate DRAMatic against Microsoft SEAL, a popular open-source HE library, regarding both runtime and energy efficiency. The results show that DRAMatic significantly closes the gap between Microsoft SEAL and HE implementations on UPMEM PIM. However, we also show that DRAMatic is currently constrained by data transfer overhead and limited multiplication performance on UPMEM PIM hardware. Finally, we discuss potential hardware extensions to UPMEM PIM.

6.6ETMar 24

PIM-CACHE: High-Efficiency Content-Aware Copy for Processing-In-Memory

Peterson Yuhala, Mpoki Mwaisela, Pascal Felber et al.

Processing-in-memory (PIM) architectures bring computation closer to data, reducing the processor-memory transfer bottleneck in traditional processor-centric designs. Novel hardware solutions, such as UPMEM's in-memory processing technology, achieve this by integrating low-power DRAM processing units (DPUs) into memory DIMMs, enabling massive parallelism and improved memory bandwidth. However, paradoxically, these PIM architectures introduce mandatory coarse-grained data transfers between host DRAM and DPUs, which often become the new bottleneck. We present PIM-CACHE, a lightweight data staging layer that dynamically eliminates redundant data transfers to PIM DPUs by exploiting workload similarity, achieving content-aware copy (CAC). We evaluate PIM-CACHE on both synthetic workloads and real-world genome datasets, demonstrating its effectiveness in reducing PIM data transfer overhead.

12.3CRApr 7, 2021Code

Plinius: Secure and Persistent Machine Learning Model Training

Peterson Yuhala, Pascal Felber, Valerio Schiavoni et al.

With the increasing popularity of cloud based machine learning (ML) techniques there comes a need for privacy and integrity guarantees for ML data. In addition, the significant scalability challenges faced by DRAM coupled with the high access-times of secondary storage represent a huge performance bottleneck for ML systems. While solutions exist to tackle the security aspect, performance remains an issue. Persistent memory (PM) is resilient to power loss (unlike DRAM), provides fast and fine-granular access to memory (unlike disk storage) and has latency and bandwidth close to DRAM (in the order of ns and GB/s, respectively). We present PLINIUS, a ML framework using Intel SGX enclaves for secure training of ML models and PM for fault tolerance guarantees. PLINIUS uses a novel mirroring mechanism to create and maintain (i) encrypted mirror copies of ML models on PM, and (ii) encrypted training data in byte-addressable PM, for near-instantaneous data recovery after a system failure. Compared to disk-based checkpointing systems, PLINIUS is 3.2x and 3.7x faster respectively for saving and restoring models on real PM hardware, achieving robust and secure ML model training in SGX enclaves.