AR LGFeb 13, 2023

OpenHLS: High-Level Synthesis for Low-Latency Deep Neural Networks for Experimental Science

Maksim Levental, Arham Khan, Ryan Chard, Kazutomo Yoshii, Kyle Chard, Ian Foster

arXiv:2302.06751v42.35 citationsh-index: 36Has Code

Originality Incremental advance

AI Analysis

This addresses the need for low-latency processing in data acquisition systems for domains like high-energy physics, though it is incremental as it builds on high-level synthesis techniques.

The authors tackled the problem of deploying deep neural networks for real-time data filtering in high-data-rate scientific experiments by developing OpenHLS, an open-source compiler framework that translates high-level network representations to low-level hardware implementations, achieving a throughput of 4.8 μs/sample, which is a 4× improvement over existing methods.

In many experiment-driven scientific domains, such as high-energy physics, material science, and cosmology, high data rate experiments impose hard constraints on data acquisition systems: collected data must either be indiscriminately stored for post-processing and analysis, thereby necessitating large storage capacity, or accurately filtered in real-time, thereby necessitating low-latency processing. Deep neural networks, effective in other filtering tasks, have not been widely employed in such data acquisition systems, due to design and deployment difficulties. We present an open source, lightweight, compiler framework, without any proprietary dependencies, OpenHLS, based on high-level synthesis techniques, for translating high-level representations of deep neural networks to low-level representations, suitable for deployment to near-sensor devices such as field-programmable gate arrays. We evaluate OpenHLS on various workloads and present a case-study implementation of a deep neural network for Bragg peak detection in the context of high-energy diffraction microscopy. We show OpenHLS is able to produce an implementation of the network with a throughput 4.8 $μ$s/sample, which is approximately a 4$\times$ improvement over the existing implementation

View on arXiv PDF

Similar