ARAIJul 4, 2025

ForgeHLS: A Large-Scale, Open-Source Dataset for High-Level Synthesis

arXiv:2507.03255v32 citationsh-index: 8Has Code
Originality Synthesis-oriented
AI Analysis

This provides a resource for researchers in hardware design and ML to advance HLS optimization, though it is incremental as it builds on existing dataset efforts.

The authors tackled the shortage of large and diverse datasets for applying machine learning to High-Level Synthesis (HLS) optimization by introducing ForgeHLS, a dataset with over 400k designs from 846 kernels, which they used to demonstrate utility in tasks like Quality of Result prediction and automated pragma exploration.

High-Level Synthesis (HLS) plays a crucial role in modern hardware design by transforming high-level code into optimized hardware implementations. However, progress in applying machine learning (ML) to HLS optimization has been hindered by a shortage of sufficiently large and diverse datasets. To bridge this gap, we introduce ForgeHLS, a large-scale, open-source dataset explicitly designed for ML-driven HLS research. ForgeHLS comprises over 400k diverse designs generated from 846 kernels covering a broad range of application domains, consuming over 200k CPU hours during dataset construction. Each kernel includes systematically automated pragma insertions (loop unrolling, pipelining, array partitioning), combined with extensive design space exploration using Bayesian optimization. Compared to existing datasets, ForgeHLS significantly enhances scale, diversity, and design coverage. We further define and evaluate representative downstream tasks in Quality of Result (QoR) prediction and automated pragma exploration, clearly demonstrating ForgeHLS utility for developing and improving ML-based HLS optimization methodologies. The dataset and code are public at https://github.com/zedong-peng/ForgeHLS.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes