Abrarul Karim

48.9DCMar 31

Exploration of Energy and Throughput Tradeoffs for Dataflow Networks

Abrarul Karim, Joachim Falk, Jürgen Teich

The introduction of dynamic power management strategies such as clock gating and power gating in dataflow networks has been shown to provide significant energy savings when applied during idle times. However, these strategies can also degrade throughput due to shutdown and wake-up delays. Such throughput degradations might be particularly detrimental to signal processing systems that require a guaranteed throughput. As a solution, this paper first contributes a linear-program formulation for finding a periodic maximal-throughput schedule of a given so-called self-powering dataflow network where actors, realized in hardware, are allowed to go to sleep whenever not being enabled to fire. Depending on which actors are allowed to power down, tradeoffs between throughput and energy savings can be obtained. As a second contribution, we propose a mixed-integer-linear-program formulation to determine a periodic schedule that satisfies a given throughput while minimizing the overall energy per period by identifying a respective set of actors that is allowed to power down in phases of idleness and which rather not. Finally, as a third contribution, we propose a multi-objective design-space exploration strategy called "Hop and Skip" to efficiently explore the Pareto front of energy and throughput solutions. Experimental evaluations on a set of existing benchmarks and randomly generated graphs witness significant exploration time reductions over a brute-force sweep. Finally, a real-world case study is elaborated, and we report on achievable energy savings and throughputs of the related dataflow network where (a) all actors are always-active, (b) all actors are self-powered, and (c) all optimal energy and throughput tradeoff points as found by the proposed design-space exploration strategy.

LGApr 28, 2025

Hardware/Software Co-Design of RISC-V Extensions for Accelerating Sparse DNNs on FPGAs

Muhammad Sabih, Abrarul Karim, Jakob Wittmann et al.

The customizability of RISC-V makes it an attractive choice for accelerating deep neural networks (DNNs). It can be achieved through instruction set extensions and corresponding custom functional units. Yet, efficiently exploiting these opportunities requires a hardware/software co-design approach in which the DNN model, software, and hardware are designed together. In this paper, we propose novel RISC-V extensions for accelerating DNN models containing semi-structured and unstructured sparsity. While the idea of accelerating structured and unstructured pruning is not new, our novel design offers various advantages over other designs. To exploit semi-structured sparsity, we take advantage of the fine-grained (bit-level) configurability of FPGAs and suggest reserving a few bits in a block of DNN weights to encode the information about sparsity in the succeeding blocks. The proposed custom functional unit utilizes this information to skip computations. To exploit unstructured sparsity, we propose a variable cycle sequential multiply-and-accumulate unit that performs only as many multiplications as the non-zero weights. Our implementation of unstructured and semi-structured pruning accelerators can provide speedups of up to a factor of 3 and 4, respectively. We then propose a combined design that can accelerate both types of sparsities, providing speedups of up to a factor of 5. Our designs consume a small amount of additional FPGA resources such that the resulting co-designs enable the acceleration of DNNs even on small FPGAs. We benchmark our designs on standard TinyML applications such as keyword spotting, image classification, and person detection.

Abrarul Karim

2 Papers