Salvatore Filippone

ARNov 29, 2023

A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures

Serena Curzel, Fabrizio Ferrandi, Leandro Fiorin et al.

Given their increasing size and complexity, the need for efficient execution of deep neural networks has become increasingly pressing in the design of heterogeneous High-Performance Computing (HPC) and edge platforms, leading to a wide variety of proposals for specialized deep learning architectures and hardware accelerators. The design of such architectures and accelerators requires a multidisciplinary approach combining expertise from several areas, from machine learning to computer architecture, low-level hardware design, and approximate computing. Several methodologies and tools have been proposed to improve the process of designing accelerators for deep learning, aimed at maximizing parallelism and minimizing data movement to achieve high performance and energy efficiency. This paper critically reviews influential tools and design methodologies for Deep Learning accelerators, offering a wide perspective in this rapidly evolving field. This work complements surveys on architectures and accelerators by covering hardware-software co-design, automated synthesis, domain-specific compilers, design space exploration, modeling, and simulation, providing insights into technical challenges and open research directions.

21.1NAApr 30

Parallel matching-based AMG preconditioners for elliptic equations discretized by IgA

Pasqua D'Ambra, Fabio Durastante, Salvatore Filippone

Isogeometric analysis (IgA) offers enhanced approximation capabilities for the discretization of elliptic boundary-value problems, yet it results in large, sparse, and increasingly ill-conditioned linear systems due to higher interconnectivity among degrees of freedom. In particular, the discretization with tensor-product B-splines or NURBS of degree $p$ on a mesh with $n$ elements per parametric direction leads to symmetric positive-definite systems of the form $K\mathbf{u} = \mathbf{F}$, where the matrix bandwidth and condition number scale unfavorably with both $p$ and spatial dimension $d$. To address the computational challenges posed by such systems, especially in three-dimensional or high-order scenarios, Krylov subspace methods with specialized preconditioners become essential. This paper investigates the efficacy of algebraic multigrid (AMG) preconditioners tailored for IgA-based discretizations, with a focus on performance in modern high-performance computing (HPC) environments. Leveraging the Parallel Sparse Computation Toolkit (PSCToolkit), we explore distributed-memory and GPU-accelerated strategies for solving large-scale problems. The study assesses algorithmic efficiency and scalability across a range of benchmark tests. The results demonstrate that AMG preconditioners can achieve robust and scalable performance, confirming their potential as practical solvers for large IgA systems in engineering and scientific applications.

Salvatore Filippone

2 Papers