28.4LGMay 9
Machine Learning-Based Graph Simplification for Symbolic AcceleratorsTiffany Yu, Rye Stahle-Smith, Darssan Eswaramoorthi et al.
Graph-based accelerators have been widely adopted in symbolic data processing applications such as genomics, cybersecurity, and artificial intelligence. However, these systems often suffer from excessive memory usage and inefficiencies stemming from redundant graph structures. We present AutoSlim, a machine learning-based framework that leverages data-driven methods to prune automata graphs for hardware accelerators. Using features extracted from prior graph executions and a Random Forest classifier, AutoSlim identifies and removes low-impact nodes and edges. When applied to a Non-deterministic Finite Automata overlay architecture (NAPOLY+), AutoSlim reduces FPGA resource usage by up to 40%, with corresponding improvements in throughput and power efficiency. The framework includes a verification step to ensure functional equivalence after pruning and suggests promising directions for both hardware optimization and security.
37.5CRMay 9
Hardware-Accelerated Line-Rate Bitstream Screening for Secure FPGA ReconfigurationRye Stahle-Smith, Carter Antley, Jason D. Bakos et al.
As Field-Programmable Gate Arrays (FPGAs) scale in multi-tenant cloud and edge-AI environments, the configuration bitstream has become a critical, yet opaque, security boundary. Existing hardware Trojan detection methods often rely on trusted design artifacts or computationally intensive reverse-engineering, introducing prohibitive latencies in dynamic, "just-in-time" reconfiguration workflows. This paper presents BLADEI (Bitstream-Level Abnormality Detection for Embedded Inference), a bitstream-level security framework designed for deployment-time screening of FPGA configurations without requiring source code, netlists, or vendor-specific tooling. BLADEI introduces a hybrid architecture that combines multi-scale byte-sequence learning with compact statistical representations to detect anomalous configurations directly from raw bitstreams. We implement the framework on a Xilinx PYNQ-Z1 system, demonstrating an end-to-end cloud-to-edge pipeline that enforces security prior to FPGA configuration. Evaluating across 1,383 bitstreams, BLADEI achieves a macro F1-score of 0.91. However, our systems-level characterization reveals a "preprocessing wall": software-based feature extraction accounts for 92% of the total 16.4-second latency, while model inference requires only 1.4 seconds. To address this bottleneck, we propose a streaming hardware-accelerated feature extraction engine designed for the FPGA programmable logic (PL). The evaluation shows that PL-based streaming engine can reduce feature-extraction latency to the millisecond range. This work positions bitstream-level screening as a first-class primitive and demonstrates that hardware-accelerated preprocessing is the key enabler for securing next-generation reconfigurable custom computing machines at line rate.
CRSep 2, 2025
Real-time ML-based Defense Against Malicious Payload in Reconfigurable Embedded SystemsRye Stahle-Smith, Rasha Karakchi
The growing use of FPGAs in reconfigurable systems introducessecurity risks through malicious bitstreams that could cause denial-of-service (DoS), data leakage, or covert attacks. We investigated chip-level hardware malicious payload in embedded systems and proposed a supervised machine learning method to detect malicious bitstreams via static byte-level features. Our approach diverges from existing methods by analyzing bitstreams directly at the binary level, enabling real-time detection without requiring access to source code or netlists. Bitstreams were sourced from state-of-the-art (SOTA) benchmarks and re-engineered to target the Xilinx PYNQ-Z1 FPGA Development Board. Our dataset included 122 samples of benign and malicious configurations. The data were vectorized using byte frequency analysis, compressed using TSVD, and balanced using SMOTE to address class imbalance. The evaluated classifiers demonstrated that Random Forest achieved a macro F1-score of 0.97, underscoring the viability of real-time Trojan detection on resource-constrained systems. The final model was serialized and successfully deployed via PYNQ to enable integrated bitstream analysis.
LGJul 11, 2025
ML-Based Automata Simplification for Symbolic AcceleratorsTiffany Yu, Rye Stahle-Smith, Darssan Eswaramoorthi et al.
Symbolic accelerators are increasingly used for symbolic data processing in domains such as genomics, NLP, and cybersecurity. However, these accelerators face scalability issues due to excessive memory use and routing complexity, especially when targeting a large set. We present AutoSlim, a machine learning-based graph simplification framework designed to reduce the complexity of symbolic accelerators built on Non-deterministic Finite Automata (NFA) deployed on FPGA-based overlays such as NAPOLY+. AutoSlim uses Random Forest classification to prune low-impact transitions based on edge scores and structural features, significantly reducing automata graph density while preserving semantic correctness. Unlike prior tools, AutoSlim targets automated score-aware simplification with weighted transitions, enabling efficient ranking-based sequence analysis. We evaluated data sets (1K to 64K nodes) in NAPOLY+ and conducted performance measurements including latency, throughput, and resource usage. AutoSlim achieves up to 40 percent reduction in FPGA LUTs and over 30 percent pruning in transitions, while scaling to graphs an order of magnitude larger than existing benchmarks. Our results also demonstrate how hardware interconnection (fanout) heavily influences hardware cost and that AutoSlim's pruning mitigates resource blowup.
LGMar 8, 2025
AI-Driven Optimization of Hardware Overlay ConfigurationsRasha Karakchi
Designing and optimizing FPGA overlays is a complex and time-consuming process, often requiring multiple trial-and-error iterations to determine a suitable configuration. This paper presents an AI-driven approach to optimizing FPGA overlay configurations, specifically focusing on the NAPOLY+ automata processor implemented on the ZCU104 FPGA. By leveraging machine learning techniques, particularly Random Forest regression, we predict the feasibility and efficiency of different configurations before hardware compilation. Our method significantly reduces the number of required iterations by estimating resource utilization, including logical elements, distributed memory, and fanout, based on historical design data. Experimental results demonstrate that our model achieves high prediction accuracy, closely matching actual resource usage while accelerating the design process.