Filippo Mantovani

AR
3papers
Novelty17%
AI Score31

3 Papers

61.2ARApr 14
EPAC: The Last Dance

Filippo Mantovani, Fabio Banchelli, Pablo Vizcaino et al.

This paper presents EPAC, a RISC-V-based accelerator chip developed within the European Processor Initiative (EPI) as part of a multi-year, multi-partner effort to build a European HPC processor ecosystem. EPAC is implemented in GlobalFoundries 22FDX (GF22FDX) technology, covers an area of 27 sq mm with approximately 0.3 billion transistors, and integrates three distinct RISC-V compute tiles targeting different workload classes: VEC, a vector processing tile for double-precision HPC workloads; STX, a many-core tile optimized for stencil and machine learning computations; and VRP, a variable-precision tile for iterative numerical solvers requiring extended floating-point formats. All tiles are connected through a Coherent Hub Interface (CHI) based network-on-chip with a distributed L2 cache system and communicate with external memory via a SerDes link. The chip was taped out in GF22FDX technology and successfully brought up, with all major IP blocks validated. This paper describes the architecture of each tile and the uncore infrastructure, the integration and physical implementation process, and the board-level bring-up activities. It also reflects on the engineering and coordination lessons learned from a full chip design effort distributed across academic and industrial partners in Europe.

LGSep 11, 2023
Compressed Real Numbers for AI: a case-study using a RISC-V CPU

Federico Rossi, Marco Cococcioni, Roger Ferrer Ibàñez et al.

As recently demonstrated, Deep Neural Networks (DNN), usually trained using single precision IEEE 754 floating point numbers (binary32), can also work using lower precision. Therefore, 16-bit and 8-bit compressed format have attracted considerable attention. In this paper, we focused on two families of formats that have already achieved interesting results in compressing binary32 numbers in machine learning applications, without sensible degradation of the accuracy: bfloat and posit. Even if 16-bit and 8-bit bfloat/posit are routinely used for reducing the storage of the weights/biases of trained DNNs, the inference still often happens on the 32-bit FPU of the CPU (especially if GPUs are not available). In this paper we propose a way to decompress a tensor of bfloat/posits just before computations, i.e., after the compressed operands have been loaded within the vector registers of a vector capable CPU, in order to save bandwidth usage and increase cache efficiency. Finally, we show the architectural parameters and considerations under which this solution is advantageous with respect to the uncompressed one.

39.2ETMay 8
Post-Moore Technologies for Plasma Simulation: A Community Roadmap

Luca Pennati, Erik M. Åsgrim, Jeremy J. Williams et al.

Plasma simulations are among the most computationally demanding scientific workloads, combining high-dimensional kinetic evolution, particle-mesh coupling, field solves, and data-intensive communication. As general-purpose processor scaling slows, post-Moore technologies are being explored to address bottlenecks in data movement, memory access, and power consumption. This paper provides a community perspective on the role of these technologies in plasma simulation, assessing three major classes: reconfigurable and data-path accelerators, non-von Neumann architectures, and quantum computing. Each is evaluated, in a co-design approach, against representative plasma workloads spanning particle-in-cell, continuum Vlasov, gyrokinetic, fluid/MHD, hybrid, and warm dense matter methods. We find that no single technology can replace existing HPC platforms. Instead, three tiers of opportunity emerge: FPGA-class and data-path accelerators offer near-term kernel offload and workflow-level data services, non-von Neumann architectures represent medium-term directions for operator-level acceleration, and quantum computing, although the least mature, is potentially the most disruptive for warm dense matter and inertial confinement fusion microphysics. We outline best practices for selective adoption and identify focused demonstrators, benchmarking, and modular software ecosystems as immediate community priorities.