Nitin Shukla

CR
3papers
6citations
Novelty18%
AI Score37

3 Papers

48.0CRMay 31
GPU Acceleration of Learning With Errors KEMs Using OpenACC for Post-Quantum Cryptography

Tiziana Liberati, Nitin Shukla, Matteo Barbieri et al.

Shor's algorithm proved that asymmetric cryptographic protocols based on the integer factorization and discrete logarithm problems are no longer safe in a world with large-scale quantum computers. As a result, Post-Quantum Cryptography (PQC) has been developed over the last few years, seeking cryptographic primitives resistant to quantum attacks. One of the main hard problems underlying PQC schemes is the Learning with Errors (LWE) problem, which is significantly more computationally intensive than its classical predecessors. In this work, we present a Key Encapsulation Mechanism (KEM) based on plain LWE and develop a GPU-oriented implementation using OpenACC. We evaluate the performance of our accelerated application in terms of both time-to-solution and energy-to-solution, considering bare-metal and containerized executions across multiple NVIDIA GPU models and generations. Our implementation achieves significant acceleration across all tested GPU platforms. In particular, on the NVIDIA Grace Hopper Superchip, it attains up to a $208\times$ speedup over a multithreaded CPU baseline and enables the execution of problem sizes that are impractical on CPU architectures due to memory and synchronization constraints. Energy consumption analysis also shows $\approx 2\times$ better efficiency when using the Superchip compared to systems equipped with x86-based CPUs and NVIDIA H100 GPUs. These results highlight the effectiveness of GPU acceleration for computationally demanding LWE-based cryptographic workloads.

86.1PLASM-PHMar 17
Accelerating the Particle-In-Cell code ECsim with OpenACC

Elisabetta Boella, Nitin Shukla, Filippo Spiga et al.

The Particle-In-Cell (PIC) method is a computational technique widely used in plasma physics to model plasmas at the kinetic level. In this work, we present our effort to prepare the semi-implicit energy-conserving PIC code ECsim for exascale architectures. To achieve this, we adopted a pragma-based acceleration strategy using OpenACC, which enables high performance while requiring minimal code restructuring. On the pre-exascale Leonardo system, the accelerated code achieves a $5 \times$ speedup and a $3 \times$ reduction in energy consumption compared to the CPU reference code. Performance comparisons across multiple NVIDIA GPU generations show substantial benefits from the GH200 unified memory architecture. Finally, strong and weak scaling tests on Leonardo demonstrate efficiency of $70 \%$ and $78 \%$ up to 64 and 1024 GPUs, respectively.

DCOct 28, 2025
Towards Exascale Computing for Astrophysical Simulation Leveraging the Leonardo EuroHPC System

Nitin Shukla, Alessandro Romeo, Caterina Caravita et al.

Developing and redesigning astrophysical, cosmological, and space plasma numerical codes for existing and next-generation accelerators is critical for enabling large-scale simulations. To address these challenges, the SPACE Center of Excellence (SPACE-CoE) fosters collaboration between scientists, code developers, and high-performance computing experts to optimize applications for the exascale era. This paper presents our strategy and initial results on the Leonardo system at CINECA for three flagship codes, namely gPLUTO, OpenGadget3 and iPIC3D, using profiling tools to analyze performance on single and multiple nodes. Preliminary tests show all three codes scale efficiently, reaching 80% scalability up to 1,024 GPUs.