LGSep 8, 2024

Low Latency Transformer Inference on FPGAs for Physics Applications with hls4ml

arXiv:2409.05207v19 citationsh-index: 61
Originality Incremental advance
AI Analysis

This enables low-latency, real-time processing for physics applications like high energy physics and LIGO, though it is incremental as it adapts existing methods to new hardware.

The study tackled efficient transformer inference on FPGAs using hls4ml, achieving latencies under 2 microseconds for real-time physics applications.

This study presents an efficient implementation of transformer architectures in Field-Programmable Gate Arrays(FPGAs) using hls4ml. We demonstrate the strategy for implementing the multi-head attention, softmax, and normalization layer and evaluate three distinct models. Their deployment on VU13P FPGA chip achieved latency less than 2us, demonstrating the potential for real-time applications. HLS4ML compatibility with any TensorFlow-built transformer model further enhances the scalability and applicability of this work. Index Terms: FPGAs, machine learning, transformers, high energy physics, LIGO

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes