Alessandro Celestini

DC
3papers
5citations
Novelty23%
AI Score35

3 Papers

DCApr 15
On the energy efficiency of sparse matrix computations on multi-GPU clusters

Massimo Bernaschi, Alessandro Celestini, Pasqua D'Ambra et al.

We investigate the energy efficiency of a library designed for parallel computations with sparse matrices. The library leverages high-performance, energy-efficient Graphics Processing Unit (GPU) accelerators to enable large-scale scientific applications. Our primary development objective was to maximize parallel performance and scalability in solving sparse linear systems whose dimensions far exceed the memory capacity of a single node. To this end, we devised methods that expose a high degree of parallelism while optimizing algorithmic implementations for efficient multi-GPU usage. Previous work has already demonstrated the library's performance efficiency on large-scale systems comprising thousands of NVIDIA GPUs, achieving improvements over state-of-the-art solutions. In this paper, we extend those results by providing energy profiles that address the growing sustainability requirements of modern HPC platforms. We present our methodology and tools for accurate runtime energy measurements of the library's core components and discuss the findings. Our results confirm that optimizing GPU computations and minimizing data movement across memory and computing nodes reduces both time-to-solution and energy consumption. Moreover, we show that the library delivers substantial advantages over comparable software frameworks on standard benchmarks.

NAJan 7, 2025
Communication-reduced Conjugate Gradient Variants for GPU-accelerated Clusters

Massimo Bernaschi, Mauro G. Carrozzo, Alessandro Celestini et al.

Linear solvers are key components in any software platform for scientific and engineering computing. The solution of large and sparse linear systems lies at the core of physics-driven numerical simulations relying on partial differential equations (PDEs) and often represents a significant bottleneck in datadriven procedures, such as scientific machine learning. In this paper, we present an efficient implementation of the preconditioned s-step Conjugate Gradient (CG) method, originally proposed by Chronopoulos and Gear in 1989, for large clusters of Nvidia GPU-accelerated computing nodes. The method, often referred to as communication-reduced or communication-avoiding CG, reduces global synchronizations and data communication steps compared to the standard approach, enhancing strong and weak scalability on parallel computers. Our main contribution is the design of a parallel solver that fully exploits the aggregation of low-granularity operations inherent to the s-step CG method to leverage the high throughput of GPU accelerators. Additionally, it applies overlap between data communication and computation in the multi-GPU sparse matrix-vector product. Experiments on classic benchmark datasets, derived from the discretization of the Poisson PDE, demonstrate the potential of the method.

SIMay 18
Epidemics in a Synthetic Urban Population with Multiple Levels of Mixing

Alessandro Celestini, Francesca Colaiori, Stefano Guarino et al.

Network--based epidemic models that account for heterogeneous contact patterns are extensively used to predict and control the diffusion of infectious diseases. We use census and survey data to reconstruct a geo--referenced and age--stratified synthetic urban population connected by stable social relations. We consider two kinds of interactions, distinguishing daily (household) contacts from other frequent contacts. Moreover, we allow any couple of individuals to have rare fortuitous interactions. We simulate the epidemic diffusion on a synthetic urban network for a typical medium-size Italian city and characterize the outbreak speed, pervasiveness, and predictability in terms of the socio--demographic and geographic features of the host population. Introducing age--structured contact patterns results in faster and more pervasive outbreaks, while assuming that the interaction frequency decays with distance has only negligible effects. Preliminary evidence shows the existence of patterns of hierarchical spatial diffusion in urban areas, with two regimes for epidemic spread in low- and high-density regions.