Artur Podobas

AR
h-index17
10papers
35citations
Novelty35%
AI Score48

10 Papers

MSJul 14, 2022
FFTc: An MLIR Dialect for Developing HPC Fast Fourier Transform Libraries

Yifei He, Artur Podobas, Måns I. Andersson et al.

Discrete Fourier Transform (DFT) libraries are one of the most critical software components for scientific computing. Inspired by FFTW, a widely used library for DFT HPC calculations, we apply compiler technologies for the development of HPC Fourier transform libraries. In this work, we introduce FFTc, a domain-specific language, based on Multi-Level Intermediate Representation (MLIR), for expressing Fourier Transform algorithms. We present the initial design, implementation, and preliminary results of FFTc.

ARMay 18
A Quarter of a Century of Neuromorphic Architectures on FPGAs -- an Overview

Wiktor J. Szczerek, Artur Podobas

Neuromorphic computing is a relatively new discipline of computer science, where the principles of biological brain's computation and memory are used to create a new way of processing information, based on networks of spiking neurons. Those networks can be implemented as both analog and digital implementations, where for the latter, the Field Programmable Gate Arrays (FPGAs) are a frequent choice, due to their inherent flexibility, allowing the researchers to easily design hardware neuromorphic architecture (NMAs). Moreover, digital NMAs show good promise in simulating various spiking neural networks because of their inherent accuracy and resilience to noise, as opposed to analog implementations. This paper presents an overview of digital NMAs implemented on FPGAs, with a goal of providing useful references to various architectural design choices to the researchers interested in digital neuromorphic systems. We present a taxonomy of NMAs that highlights groups of distinct architectural features, their advantages and disadvantages and identify trends and predictions for the future of those architectures.

ARMar 16
bitSMM: A bit-Serial Matrix Multiplication Accelerator

Pedro Antunes, Artur Podobas

Neural-network (NN) inference is increasingly present on-board spacecraft to reduce downlink bandwidth and enable timely decision making. However, the power and reliability constraints of space missions limit the applicability of many state-of-the-art NN accelerators. This paper presents bitSMM, a bit-serial matrix multiplication accelerator built around a systolic array of bit-serial multiply--accumulate (MAC) units. The design supports runtime-configurable operand precision from 1 to 16 bits and evaluates two MAC variants: a Booth-inspired architecture and a standard binary multiplication with correction architecture. We implement bitSMM in [System]Verilog and evaluate it on an AMD ZCU104 FPGA and through ASIC physical implementation using the asap7 and nangate45 process design kits. On the FPGA, bitSMM achieves up to 19.2~GOPS and 2.973~GOPS/W, and in asap7 it achieves up to 73.22~GOPS, 552~GOPS/mm$^2$, and 40.8~GOPS/W.

ARMar 14
Evaluating Four FPGA-accelerated Space Use Cases based on Neural Network Algorithms for On-board Inference

Pedro Antunes, Muhammad Ihsan Al Hafiz, Jonah Ekelund et al.

Space missions increasingly deploy high-fidelity sensors that produce data volumes exceeding onboard buffering and downlink capacity. This work evaluates FPGA acceleration of neural networks (NNs) across four space use cases on the AMD ZCU104 board. We use Vitis AI (AMD DPU) and Vitis HLS to implement inference, quantify throughput and energy, and expose toolchain and architectural constraints relevant to deployment. Vitis AI achieves up to 34.16$\times$ higher inference rate than the embedded ARM CPU baseline, while custom HLS designs reach up to 5.4$\times$ speedup and add support for operators (e.g., sigmoids, 3D layers) absent in the DPU. For these implementations, measured MPSoC inference power spans 1.5-6.75 W, reducing energy per inference versus CPU execution in all use cases. These results show that NN FPGA acceleration can enable onboard filtering, compression, and event detection, easing downlink pressure in future missions.

ARApr 30
NeuroRing: Scaling Spiking Neural Networks via Multi-FPGA Bidirectional Ring Topologies and Stream-Dataflow Architectures

Muhammad Ihsan Al Hafiz, Artur Podobas

Spiking neural networks (SNNs) are a promising paradigm for energy-efficient event-driven computation, but large-scale SNN execution remains challenging because sparse spike communication and synchronization can dominate runtime. Existing solutions across CPU, GPU, ASIC, and FPGA platforms offer different trade-offs between programmability, efficiency, and scalability. To address this gap, we present NeuroRing, a modular and scalable SNN accelerator based on a stream-dataflow architecture and a bidirectional ring topology, implemented in High-Level Synthesis (HLS) on programmable FPGAs. NeuroRing supports modular single- and multi-FPGA deployment and is compatible with existing SNN workflows through integration with the NEST simulator. We evaluate NeuroRing on the cortical microcircuit benchmark and a Sudoku constraint-satisfaction workload. Results show that NeuroRing preserves the key activity statistics of the NEST reference model, achieves faster-than-real-time execution of the full-scale cortical microcircuit with a real-time factor (RTF) of 0.83, exhibits meaningful strong and weak scaling, and provides competitive energy efficiency on two programmable FPGAs. These results position NeuroRing as a flexible and scalable platform for both neuroscience simulation and broader event-driven applications.

ARJun 23, 2025
Embedded FPGA Acceleration of Brain-Like Neural Networks: Online Learning to Scalable Inference

Muhammad Ihsan Al Hafiz, Naresh Ravichandran, Anders Lansner et al.

Edge AI applications increasingly require models that can learn and adapt on-device with minimal energy budget. Traditional deep learning models, while powerful, are often overparameterized, energy-hungry, and dependent on cloud connectivity. Brain-Like Neural Networks (BLNNs), such as the Bayesian Confidence Propagation Neural Network (BCPNN), propose a neuromorphic alternative by mimicking cortical architecture and biologically-constrained learning. They offer sparse architectures with local learning rules and unsupervised/semi-supervised learning, making them well-suited for low-power edge intelligence. However, existing BCPNN implementations rely on GPUs or datacenter FPGAs, limiting their applicability to embedded systems. This work presents the first embedded FPGA accelerator for BCPNN on a Zynq UltraScale+ SoC using High-Level Synthesis. We implement both online learning and inference-only kernels with support for variable and mixed precision. Evaluated on MNIST, Pneumonia, and Breast Cancer datasets, our accelerator achieves up to 17.5x latency and 94% energy savings over ARM baselines, without sacrificing accuracy. This work enables practical neuromorphic computing on edge devices, bridging the gap between brain-like learning and real-world deployment.

ARApr 22, 2025
FPGA-Based Neural Network Accelerators for Space Applications: A Survey

Pedro Antunes, Artur Podobas

Space missions are becoming increasingly ambitious, necessitating high-performance onboard spacecraft computing systems. In response, field-programmable gate arrays (FPGAs) have garnered significant interest due to their flexibility, cost-effectiveness, and radiation tolerance potential. Concurrently, neural networks (NNs) are being recognized for their capability to execute space mission tasks such as autonomous operations, sensor data analysis, and data compression. This survey serves as a valuable resource for researchers aiming to implement FPGA-based NN accelerators in space applications. By analyzing existing literature, identifying trends and gaps, and proposing future research directions, this work highlights the potential of these accelerators to enhance onboard computing systems.

LGJul 14, 2021
Higgs Boson Classification: Brain-inspired BCPNN Learning with StreamBrain

Martin Svedin, Artur Podobas, Steven W. D. Chien et al.

One of the most promising approaches for data analysis and exploration of large data sets is Machine Learning techniques that are inspired by brain models. Such methods use alternative learning rules potentially more efficiently than established learning rules. In this work, we focus on the potential of brain-inspired ML for exploiting High-Performance Computing (HPC) resources to solve ML problems: we discuss the BCPNN and an HPC implementation, called StreamBrain, its computational cost, suitability to HPC systems. As an example, we use StreamBrain to analyze the Higgs Boson dataset from High Energy Physics and discriminate between background and signal classes in collisions of high-energy particle colliders. Overall, we reach up to 69.15% accuracy and 76.4% Area Under the Curve (AUC) performance.

DCJun 9, 2021
StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs

Artur Podobas, Martin Svedin, Steven W. D. Chien et al.

The modern deep learning method based on backpropagation has surged in popularity and has been used in multiple domains and application areas. At the same time, there are other -- less-known -- machine learning algorithms with a mature and solid theoretical foundation whose performance remains unexplored. One such example is the brain-like Bayesian Confidence Propagation Neural Network (BCPNN). In this paper, we introduce StreamBrain -- a framework that allows neural networks based on BCPNN to be practically deployed in High-Performance Computing systems. StreamBrain is a domain-specific language (DSL), similar in concept to existing machine learning (ML) frameworks, and supports backends for CPUs, GPUs, and even FPGAs. We empirically demonstrate that StreamBrain can train the well-known ML benchmark dataset MNIST within seconds, and we are the first to demonstrate BCPNN on STL-10 size networks. We also show how StreamBrain can be used to train with custom floating-point formats and illustrate the impact of using different bfloat variations on BCPNN using FPGAs.

COMP-PHOct 11, 2020
Automatic Particle Trajectory Classification in Plasma Simulations

Stefano Markidis, Ivy Peng, Artur Podobas et al.

Numerical simulations of plasma flows are crucial for advancing our understanding of microscopic processes that drive the global plasma dynamics in fusion devices, space, and astrophysical systems. Identifying and classifying particle trajectories allows us to determine specific on-going acceleration mechanisms, shedding light on essential plasma processes. Our overall goal is to provide a general workflow for exploring particle trajectory space and automatically classifying particle trajectories from plasma simulations in an unsupervised manner. We combine pre-processing techniques, such as Fast Fourier Transform (FFT), with Machine Learning methods, such as Principal Component Analysis (PCA), k-means clustering algorithms, and silhouette analysis. We demonstrate our workflow by classifying electron trajectories during magnetic reconnection problem. Our method successfully recovers existing results from previous literature without a priori knowledge of the underlying system. Our workflow can be applied to analyzing particle trajectories in different phenomena, from magnetic reconnection, shocks to magnetospheric flows. The workflow has no dependence on any physics model and can identify particle trajectories and acceleration mechanisms that were not detected before.