57.7SYJun 1
Package-Embedded Coupled Inductor Arrays for High-Performance Computing Power DeliveryRami Rasheedi, Salma Abdelzaher, Inna Partin-Vaisband
A novel power delivery framework, comprising a package-embedded inductor topology and an inductance-island methodology, is introduced to maximize both inductance and current densities in vertical power delivery (VPD). The framework leverages multiple multi-phase converters, a common strategy in high-performance computing systems, to enhance efficiency and scalability. The proposed topology employs an array of tightly coupled spiral square inductors sharing a common magnetic rod, serving multiple converters operating in the same conversion phase. The array is optimized to maximize coupling and minimize conversion losses, achieving superior inductance and current densities of 250 nH/mm^2 and 10 A/mm^2, respectively. At the system level, the inductance-island methodology partitions the power delivery network into multiple islands, each dedicated to a converter phase and supplying a portion of the load current, thereby enabling scalable and efficient distribution. To validate the framework, the inductor array is designed and simulated in ANSYS Maxwell 3D and Mechanical, exhibiting an average quality factor of 23.6 and efficiency of 97.4% at 2 A load current, 6 V input, and 10 MHz switching frequency. The inductor array netlist is extracted from ANSYS and co-designed in Cadence Virtuoso with a distributed dual-phase power conversion system, ensuring joint optimization of passive and active components. The co-designed converter achieves a significant efficiency gain of 5.65% on average and up to 11.04% at 40 A load over a similar converter with uncoupled inductors, demonstrating the practical benefits of the approach.
0.1LGMay 31
PALTO: Physics-Informed Active Learning for Tri-Gate FinFET Design Optimization for Vertical Power DeliveryAyoub Sadeghi, Leonid Popryho, Inna Partin-Vaisband
This paper demonstrates the effectiveness of machine learning-driven optimization for designing application-specific GaN tri-gate FinFETs in vertical power delivery systems. Conventional TCAD-based approaches are computationally intensive and insufficient for navigating the high-dimensional, nonlinear design space of advanced GaN devices. To address this, a physics-informed active learning framework is used to intelligently guide simulations, accelerating convergence while preserving accuracy. This ML-guided approach enables the discovery of optimal configurations by efficiently exploring key structural parameters -- most notably the GaN-to-AlGaN thickness ratio -- a long-standing focus of debate in device design. By systematically exploring key structural parameters, two optimized devices with aggressively scaled gate-to-drain lengths are identified. Single-fin, multi-channel simulations show that device~D2, with a thinner GaN channel relative to the AlGaN barrier, achieves higher drive current. However, in a 300-fin configuration, device~D1 outperforms device~D2 by delivering 3.3\,A at 0.49~ohm on-resistance -- approximately 2$\times$ better -- despite slightly higher parasitics. Both devices operate in a normally-off mode. Based on an application-specific figure of merit, device~D1 achieves 5\,pC$\cdot$ohm, demonstrating 2$\times$ greater switching efficiency than device~D2, while both designs outperform industrial benchmarks from different performance standpoints.
51.3SYMay 24
Power-Integrity Modeling of VR Faults in High-Performance ApplicationsSriharini Krishnakumar, Inna Partin-Vaisband
Distributed vertical power delivery has emerged as a promising approach to meet aggressive current-density, efficiency, and transient response requirements in high-performance computing systems. Tight integration of voltage regulators within stacked substrates, however, increases the vulnerability of the power delivery system to short-circuit and open-circuit faults arising from elevated thermal and mechanical stresses. Such faults can propagate through the shared power delivery network, leading to rapid degradation of system-wide efficiency at worst-case rates of up to 0.5% per microsecond. Advanced fault-tolerant power management strategies are therefore required to ensure efficient power delivery. A real-time fault-detection and isolation methodology are proposed in this paper for vertical power delivery systems. The methodology is developed based on an analytical inductor-current models that rely solely on signals available within the converter control circuitry, thereby eliminating additional sensing overhead. The proposed framework is designed and simulated in SPICE environment, demonstrating sub-microsecond fault detection and effective dual-fuse isolation, maintaining uninterrupted power delivery with a system-wide efficiency degradation of less than 2%.
47.8SYMay 24
Dynamic Power Management Methodology for Distributed Vertical Power Delivery in High-Performance Computing SystemsSriharini Krishnakumar, Inna Partin-Vaisband
Distributed vertical power delivery (DVPD) architectures employ multiple parallel voltage regulators (VRs) to meet the high-power and high current density demands of modern high performance computing (HPC) systems. While full parallel activation maximizes efficiency near peak load, medium to light load operation leads to efficiency degradation when all VRs remain active due to persistent switching and gate drive losses. This work proposes a load aware power system activation framework targeted at the medium to light load regime, in which the number of active VRs scales proportionally with instantaneous load power. A spatially informed selection strategy determines which VRs are activated from the available pool, aligning regulator placement with localized power demand. This locality aware activation minimizes lateral redistribution currents within the power plane and reduces conduction losses and voltage drops. Simulation results on a representative DVPD system demonstrate 2x to 3x switching loss reduction relative to conventional full-parallel light load operation, while sustaining an approximately 87% efficiency plateau across the 5% to 30% load range. Output ripple constraints are preserved, with inductor current ripple maintained within 6% and output voltage ripple within 2%, ensuring regulation integrity while improving overall conversion efficiency.
CRNov 11, 2025
Automated Hardware Trojan Insertion in Industrial-Scale DesignsYaroslav Popryho, Debjit Pal, Inna Partin-Vaisband
Industrial Systems-on-Chips (SoCs) often comprise hundreds of thousands to millions of nets and millions to tens of millions of connectivity edges, making empirical evaluation of hardware-Trojan (HT) detectors on realistic designs both necessary and difficult. Public benchmarks remain significantly smaller and hand-crafted, while releasing truly malicious RTL raises ethical and operational risks. This work presents an automated and scalable methodology for generating HT-like patterns in industry-scale netlists whose purpose is to stress-test detection tools without altering user-visible functionality. The pipeline (i) parses large gate-level designs into connectivity graphs, (ii) explores rare regions using SCOAP testability metrics, and (iii) applies parameterized, function-preserving graph transformations to synthesize trigger-payload pairs that mimic the statistical footprint of stealthy HTs. When evaluated on the benchmarks generated in this work, representative state-of-the-art graph-learning models fail to detect Trojans. The framework closes the evaluation gap between academic circuits and modern SoCs by providing reproducible challenge instances that advance security research without sharing step-by-step attack instructions.
2.7LGMar 31
From Physics to Surrogate Intelligence: A Unified Electro-Thermo-Optimization Framework for TSV NetworksMohamed Gharib, Leonid Popryho, Inna Partin-Vaisband
High-density through-substrate vias (TSVs) enable 2.5D/3D heterogeneous integration but introduce significant signal-integrity and thermal-reliability challenges due to electrical coupling, insertion loss, and self-heating. Conventional full-wave finite-element method (FEM) simulations provide high accuracy but become computationally prohibitive for large design-space exploration. This work presents a scalable electro-thermal modeling and optimization framework that combines physics-informed analytical modeling, graph neural network (GNN) surrogates, and full-wave sign-off validation. A multi-conductor analytical model computes broadband S-parameters and effective anisotropic thermal conductivities of TSV arrays, achieving $5\%-10\%$ relative Frobenius error (RFE) across array sizes up to $15x15$. A physics-informed GNN surrogate (TSV-PhGNN), trained on analytical data and fine-tuned with HFSS simulations, generalizes to larger arrays with RFE below $2\%$ and nearly constant variance. The surrogate is integrated into a multi-objective Pareto optimization framework targeting reflection coefficient, insertion loss, worst-case crosstalk (NEXT/FEXT), and effective thermal conductivity. Millions of TSV configurations can be explored within minutes, enabling exhaustive layout and geometric optimization that would be infeasible using FEM alone. Final designs are validated with Ansys HFSS and Mechanical, showing strong agreement. The proposed framework enables rapid electro-thermal co-design of TSV arrays while reducing per-design evaluation time by more than six orders of magnitude.
LGNov 21, 2025
GANGR: GAN-Assisted Scalable and Efficient Global Routing ParallelizationHadi Khodaei Jooshin, Inna Partin-Vaisband
Global routing is a critical stage in electronic design automation (EDA) that enables early estimation and optimization of the routability of modern integrated circuits with respect to congestion, power dissipation, and design complexity. Batching is a primary concern in top-performing global routers, grouping nets into manageable sets to enable parallel processing and efficient resource usage. This process improves memory usage, scalable parallelization on modern hardware, and routing congestion by controlling net interactions within each batch. However, conventional batching methods typically depend on heuristics that are computationally expensive and can lead to suboptimal results (oversized batches with conflicting nets, excessive batch counts degrading parallelization, and longer batch generation times), ultimately limiting scalability and efficiency. To address these limitations, a novel batching algorithm enhanced with Wasserstein generative adversarial networks (WGANs) is introduced in this paper, enabling more effective parallelization by generating fewer higher-quality batches in less time. The proposed algorithm is tested on the latest ISPD'24 contest benchmarks, demonstrating up to 40% runtime reduction with only 0.002% degradation in routing quality as compared to state-of-the-art router.
LGAug 22, 2025
Fast and Accurate RFIC Performance Prediction via Pin Level Graph Neural Networks and Probabilistic FlowAnahita Asadi, Leonid Popryho, Inna Partin-Vaisband
Accurately predicting the performance of active radio frequency (RF) circuits is essential for modern wireless systems but remains challenging due to highly nonlinear, layout-sensitive behavior and the high computational cost of traditional simulation tools. Existing machine learning (ML) surrogates often require large datasets to generalize across various topologies or to accurately model skewed and multi-modal performance metrics. In this work, a lightweight, data-efficient, and topology-aware graph neural network (GNN) model is proposed for predicting key performance metrics of multiple topologies of active RF circuits such as low noise amplifiers (LNAs), mixers, voltage-controlled oscillators (VCOs), and PAs. To capture transistor-level symmetry and preserve fine-grained connectivity details, circuits are modeled at the device-terminal level, enabling scalable message passing while reducing data requirements. Masked autoregressive flow (MAF) output heads are incorporated to improve robustness in modeling complex target distributions. Experiments on datasets demonstrate high prediction accuracy, with symmetric mean absolute percentage error (sMAPE) and mean relative error (MRE) averaging 2.40% and 2.91%, respectively. Owing to the pin-level conversion of circuit to graph and ML architecture robust to modeling complex densities of RF metrics, the MRE is improved by 3.14x while using 2.24x fewer training samples compared to prior work, demonstrating the method's effectiveness for rapid and accurate RF circuit design automation.
ARJun 12, 2020
A Unified Learning Platform for Dynamic Frequency Scaling in Pipelined ProcessorsArash Fouman Ajirlou, Inna Partin-Vaisband
A machine learning (ML) design framework is proposed for dynamically adjusting clock frequency based on propagation delay of individual instructions. A Random Forest model is trained to classify propagation delays in real-time, utilizing current operation type, current operands, and computation history as ML features. The trained model is implemented in Verilog as an additional pipeline stage within a baseline processor. The modified system is simulated at the gate-level in 45 nm CMOS technology, exhibiting a speed-up of 68% and energy reduction of 37% with coarse-grained ML classification. A speed-up of 95% is demonstrated with finer granularities at additional energy costs.
CVDec 17, 2019
Progressive VAE Training on Highly Sparse and Imbalanced DataDmitry Utyamishev, Inna Partin-Vaisband
In this paper, we present a novel approach for training a Variational Autoencoder (VAE) on a highly imbalanced data set. The proposed training of a high-resolution VAE model begins with the training of a low-resolution core model, which can be successfully trained on imbalanced data set. In subsequent training steps, new convolutional, upsampling, deconvolutional, and downsampling layers are iteratively attached to the model. In each iteration, the additional layers are trained based on the intermediate pretrained model - a result of previous training iterations. Thus, the resolution of the model is progressively increased up to the required resolution level. In this paper, the progressive VAE training is exploited for learning a latent representation with imbalanced, highly sparse data sets and, consequently, generating routes in a constrained 2D space. Routing problems (e.g., vehicle routing problem, travelling salesman problem, and arc routing) are of special significance in many modern applications (e.g., route planning, network maintenance, developing high-performance nanoelectronic systems, and others) and typically associated with sparse imbalanced data. In this paper, the critical problem of routing billions of components in nanoelectronic devices is considered. The proposed approach exhibits a significant training speedup as compared with state-of-the-art existing VAE training methods, while generating expected image outputs from unseen input data. Furthermore, the final progressive VAE models exhibit much more precise output representation, than the Generative Adversarial Network (GAN) models trained with comparable training time. The proposed method is expected to be applicable to a wide range of applications, including but not limited image impainting, sentence interpolation, and semi-supervised learning.
ETDec 9, 2019
Exploiting Dual-Gate Ambipolar CNFETs for Scalable Machine Learning ClassificationFarid Kenarangi, Xuan Hu, Yihan Liu et al.
Ambipolar carbon nanotube based field-effect transistors (AP-CNFETs) exhibit unique electrical characteristics, such as tri-state operation and bi-directionality, enabling systems with complex and reconfigurable computing. In this paper, AP-CNFETs are used to design a mixed-signal machine learning (ML) classifier. The classifier is designed in SPICE with feature size of 15 nm and operates at 250 MHz. The system is demonstrated based on MNIST digit dataset, yielding 90% accuracy and no accuracy degradation as compared with the classification of this dataset in Python. The system also exhibits lower power consumption and smaller physical size as compared with the state-of-the-art CMOS and memristor based mixed-signal classifiers.
SPOct 21, 2019
A Single-MOSFET MAC for Confidence and Resolution (CORE) Driven Machine Learning ClassificationFarid Kenarangi, Inna Partin-Vaisband
Mixed-signal machine-learning classification has recently been demonstrated as an efficient alternative for classification with power expensive digital circuits. In this paper, a high-COnfidence high-REsolution (CORE) mixed-signal classifier is proposed for classifying high-dimensional input data into multi-class output space with less power and area than state-of-the-art classifiers. A high-resolution multiplication is facilitated within a single-MOSFET by feeding the features and feature weights into, respectively, the body and gate inputs. High-resolution classifier that considers the confidence of the individual predictors is designed at 45 nm technology node and operates at 100 MHz in subthreshold region. To evaluate the performance of the classifier, a reduced MNIST dataset is generated by downsampling the MNIST digit images from 28 $\times$ 28 features to 9 $\times$ 9 features. The system is simulated across a wide range of PVT variations, exhibiting nominal accuracy of 90%, energy consumption of 6.2 pJ per classification (over 45 times lower than state-of-the-art classifiers), area of 2,179 $μ$$m^{2}$ (over 7.3 times lower than state-of-the-art classifiers), and a stable response under PVT variations.