48.0CRMay 31
GPU Acceleration of Learning With Errors KEMs Using OpenACC for Post-Quantum CryptographyTiziana Liberati, Nitin Shukla, Matteo Barbieri et al.
Shor's algorithm proved that asymmetric cryptographic protocols based on the integer factorization and discrete logarithm problems are no longer safe in a world with large-scale quantum computers. As a result, Post-Quantum Cryptography (PQC) has been developed over the last few years, seeking cryptographic primitives resistant to quantum attacks. One of the main hard problems underlying PQC schemes is the Learning with Errors (LWE) problem, which is significantly more computationally intensive than its classical predecessors. In this work, we present a Key Encapsulation Mechanism (KEM) based on plain LWE and develop a GPU-oriented implementation using OpenACC. We evaluate the performance of our accelerated application in terms of both time-to-solution and energy-to-solution, considering bare-metal and containerized executions across multiple NVIDIA GPU models and generations. Our implementation achieves significant acceleration across all tested GPU platforms. In particular, on the NVIDIA Grace Hopper Superchip, it attains up to a $208\times$ speedup over a multithreaded CPU baseline and enables the execution of problem sizes that are impractical on CPU architectures due to memory and synchronization constraints. Energy consumption analysis also shows $\approx 2\times$ better efficiency when using the Superchip compared to systems equipped with x86-based CPUs and NVIDIA H100 GPUs. These results highlight the effectiveness of GPU acceleration for computationally demanding LWE-based cryptographic workloads.
PLASM-PHApr 26, 2023
Unsupervised classification of fully kinetic simulations of plasmoid instability using Self-Organizing Maps (SOMs)Sophia Köhne, Elisabetta Boella, Maria Elena Innocenti
The growing amount of data produced by simulations and observations of space physics processes encourages the use of methods rooted in Machine Learning for data analysis and physical discovery. We apply a clustering method based on Self-Organizing Maps (SOM) to fully kinetic simulations of plasmoid instability, with the aim of assessing its suitability as a reliable analysis tool for both simulated and observed data. We obtain clusters that map well, a posteriori, to our knowledge of the process: the clusters clearly identify the inflow region, the inner plasmoid region, the separatrices, and regions associated with plasmoid merging. SOM-specific analysis tools, such as feature maps and Unified Distance Matrix, provide one with valuable insights into both the physics at work and specific spatial regions of interest. The method appears as a promising option for the analysis of data, both from simulations and from observations, and could also potentially be used to trigger the switch to different simulation models or resolution in coupled codes for space simulations.
QUANT-PHAug 6, 2025
Dynamic Solutions for Hybrid Quantum-HPC Resource AllocationRoberto Rocco, Simone Rizzo, Matteo Barbieri et al.
The integration of quantum computers within classical High-Performance Computing (HPC) infrastructures is receiving increasing attention, with the former expected to serve as accelerators for specific computational tasks. However, combining HPC and quantum computers presents significant technical challenges, including resource allocation. This paper presents a novel malleability-based approach, alongside a workflow-based strategy, to optimize resource utilization in hybrid HPC-quantum workloads. With both these approaches, we can release classical resources when computations are offloaded to the quantum computer and reallocate them once quantum processing is complete. Our experiments with a hybrid HPC-quantum use case show the benefits of dynamic allocation, highlighting the potential of those solutions.
86.1PLASM-PHMar 17
Accelerating the Particle-In-Cell code ECsim with OpenACCElisabetta Boella, Nitin Shukla, Filippo Spiga et al.
The Particle-In-Cell (PIC) method is a computational technique widely used in plasma physics to model plasmas at the kinetic level. In this work, we present our effort to prepare the semi-implicit energy-conserving PIC code ECsim for exascale architectures. To achieve this, we adopted a pragma-based acceleration strategy using OpenACC, which enables high performance while requiring minimal code restructuring. On the pre-exascale Leonardo system, the accelerated code achieves a $5 \times$ speedup and a $3 \times$ reduction in energy consumption compared to the CPU reference code. Performance comparisons across multiple NVIDIA GPU generations show substantial benefits from the GH200 unified memory architecture. Finally, strong and weak scaling tests on Leonardo demonstrate efficiency of $70 \%$ and $78 \%$ up to 64 and 1024 GPUs, respectively.
5.0DCMay 4
Assessing Performance and Porting Strategies for Gravitational $N$-Body Simulations on the RISC-V-Based Tenstorrent Wormhole\textsuperscript{\texttrademark}Jenny Lynn Almerol, Elisabetta Boella, Mario Spera et al.
While RISC-V-based accelerators were initially designed with artificial intelligence applications in mind, they are increasingly being recognized as promising platforms for high performance scientific computing. In this work, we present three strategies for scaling an $N$-body code across multiple Tenstorrent Wormhole accelerators based on the RISC-V architecture. We assess the performance of these approaches by measuring both the execution time and the energy consumption required to complete a representative simulation, ultimately identifying the configuration that offers the most favorable balance between efficiency and performance.
DCOct 28, 2025
Towards Exascale Computing for Astrophysical Simulation Leveraging the Leonardo EuroHPC SystemNitin Shukla, Alessandro Romeo, Caterina Caravita et al.
Developing and redesigning astrophysical, cosmological, and space plasma numerical codes for existing and next-generation accelerators is critical for enabling large-scale simulations. To address these challenges, the SPACE Center of Excellence (SPACE-CoE) fosters collaboration between scientists, code developers, and high-performance computing experts to optimize applications for the exascale era. This paper presents our strategy and initial results on the Leonardo system at CINECA for three flagship codes, namely gPLUTO, OpenGadget3 and iPIC3D, using profiling tools to analyze performance on single and multiple nodes. Preliminary tests show all three codes scale efficiently, reaching 80% scalability up to 1,024 GPUs.