AISep 12, 2022
Mining SoC Message Flows with Attention ModelMd Rubel Ahmed, Bardia Nadimi, Hao Zheng
High-quality system-level message flow specifications are necessary for comprehensive validation of system-on-chip (SoC) designs. However, manual development and maintenance of such specifications are daunting tasks. We propose a disruptive method that utilizes deep sequence modeling with the attention mechanism to infer accurate flow specifications from SoC communication traces. The proposed method can overcome the inherent complexity of SoC traces induced by the concurrent executions of SoC designs that existing mining tools often find extremely challenging. We conduct experiments on five highly concurrent traces and find that the proposed approach outperforms several existing state-of-the-art trace mining tools.
LGMar 9, 2022
Deep Bidirectional Transformers for SoC Flow Specification MiningMd Rubel Ahmed, Hao Zheng
High-quality system-level message flow specifications can lead to comprehensive validation of system-on-chip (SoC) designs. We propose a disruptive method that utilizes an attention mechanism to produce accurate flow specifications from SoC IP communication traces. The proposed method can overcome the inherent complexity of SoC traces induced by the concurrency and parallelism of multicore designs that existing flow specification mining tools often find extremely challenging. Experiments on highly interleaved traces show promising flow reconstruction compared to several tools dedicated to the flow specification mining problem.
CVAug 31, 2024
Data Augmentation for Image Classification using Generative AIFazle Rahat, M Shifat Hossain, Md Rubel Ahmed et al.
Scaling laws dictate that the performance of AI models is proportional to the amount of available data. Data augmentation is a promising solution to expanding the dataset size. Traditional approaches focused on augmentation using rotation, translation, and resizing. Recent approaches use generative AI models to improve dataset diversity. However, the generative methods struggle with issues such as subject corruption and the introduction of irrelevant artifacts. In this paper, we propose the Automated Generative Data Augmentation (AGA). The framework combines the utility of large language models (LLMs), diffusion models, and segmentation models to augment data. AGA preserves foreground authenticity while ensuring background diversity. Specific contributions include: i) segment and superclass based object extraction, ii) prompt diversity with combinatorial complexity using prompt decomposition, and iii) affine subject manipulation. We evaluate AGA against state-of-the-art (SOTA) techniques on three representative datasets, ImageNet, CUB, and iWildCam. The experimental evaluation demonstrates an accuracy improvement of 15.6% and 23.5% for in and out-of-distribution data compared to baseline models, respectively. There is also a 64.3% improvement in SIC score compared to the baselines.
LGFeb 22
CTS-Bench: Benchmarking Graph Coarsening Trade-offs for GNNs in Clock Tree SynthesisBarsat Khadka, Kawsher Roxy, Md Rubel Ahmed
Graph Neural Networks (GNNs) are increasingly explored for physical design analysis in Electronic Design Automation, particularly for modeling Clock Tree Synthesis behavior such as clock skew and buffering complexity. However, practical deployment remains limited due to the prohibitive memory and runtime cost of operating on raw gate-level netlists. Graph coarsening is commonly used to improve scalability, yet its impact on CTS-critical learning objectives is not well characterized. This paper introduces CTS-Bench, a benchmark suite for systematically evaluating the trade-offs between graph coarsening, prediction accuracy, and computational efficiency in GNN-based CTS analysis. CTS-Bench consists of 4,860 converged physical design solutions spanning five architectures and provides paired raw gate-level and clustered graph representations derived from post-placement designs. Using clock skew prediction as a representative CTS task, we demonstrate a clear accuracy-efficiency trade-off. While graph coarsening reduces GPU memory usage by up to 17.2x and accelerates training by up to 3x, it also removes structural information essential for modeling clock distribution, frequently resulting in negative $R^2$ scores under zero-shot evaluation. Our findings indicate that generic graph clustering techniques can fundamentally compromise CTS learning objectives, even when global physical metrics remain unchanged. CTS-Bench enables principled evaluation of CTS-aware graph coarsening strategies, supports benchmarking of GNN architectures and accelerators under realistic physical design constraints, and provides a foundation for developing learning-assisted CTS analysis and optimization techniques.
ARMar 15, 2024
AutoHLS: Learning to Accelerate Design Space Exploration for HLS DesignsMd Rubel Ahmed, Toshiaki Koike-Akino, Kieran Parsons et al.
High-level synthesis (HLS) is a design flow that leverages modern language features and flexibility, such as complex data structures, inheritance, templates, etc., to prototype hardware designs rapidly. However, exploring various design space parameters can take much time and effort for hardware engineers to meet specific design specifications. This paper proposes a novel framework called AutoHLS, which integrates a deep neural network (DNN) with Bayesian optimization (BO) to accelerate HLS hardware design optimization. Our tool focuses on HLS pragma exploration and operation transformation. It utilizes integrated DNNs to predict synthesizability within a given FPGA resource budget. We also investigate the potential of emerging quantum neural networks (QNNs) instead of classical DNNs for the AutoHLS pipeline. Our experimental results demonstrate up to a 70-fold speedup in exploration time.
IRFeb 13, 2021
Model Synthesis for Communication Traces of System-on-Chip DesignsHao Zheng, Md Rubel Ahmed, Parijat Mukherjee et al.
Concise and abstract models of system-level behaviors are invaluable in design analysis, testing, and validation. In this paper, we consider the problem of inferring models from communication traces of system-on-chip~(SoC) designs. The traces capture communications among different blocks of a SoC design in terms of messages exchanged. The extracted models characterize the system-level communication protocols governing how blocks exchange messages, and coordinate with each other to realize various system functions. In this paper, the above problem is formulated as a constraint satisfaction problem, which is then fed to a SMT solver. The solutions returned by the SMT solver are used to extract the models that accept the input traces. In the experiments, we demonstrate the proposed approach with traces collected from a transaction-level simulation model of a multicore SoC design and traces of a more detailed multicore SoC design developed in GEM5 environment.
SEMay 22, 2020
Mining Message Flows from System-on-Chip Execution TracesMD Rubel Ahmed, Hao Zheng, Parijat Mukherjee et al.
Comprehensive and well-defined specifications are necessary to perform rigorous and thorough validation of system-on-chip (SoC) designs. Message flows specify how components of an SoC design communicate and coordinate with each other to realize various system functions. Message flow specifications are essential for efficient system-level validation and debug for SoC designs. However, in practice such specifications are usually not available, often ambiguous, incomplete, or even contain errors. This paper addresses that problem by proposing a specification mining framework, FlowMiner, that automatically extracts message flows from SoC execution traces, which, unlike software traces, show a high degree of concurrency. A set of inference rules and optimization techniques are presented to improve mining performance and reduce mining complexity. Evaluation of this framework in several experiments shows promising results.