Alberto Garcia-Ortiz

AR
h-index26
6papers
1citation
Novelty47%
AI Score42

6 Papers

11.8ARMay 28
Design-Oriented Modeling of TSV Substrate Noise Coupling to Ring VCOs

Ilias Exouzidis, Alberto Garcia-Ortiz, George Floros et al.

Through-silicon vias (TSVs) enable dense vertical interconnects in 3D-IC and chiplet systems, but their metal-oxide-silicon structure introduces significant parasitic coupling paths that can degrade the spectral purity of sensitive RF blocks. This paper presents a compact, design-oriented methodology for assessing TSV-induced substrate noise in mixed-signal circuits. We derive a closed-form analytical three-port RLGC macromodel for a Signal-Ground TSV pair that explicitly exposes the substrate node. The methodology is validated using a three-stage Ring VCO designed in a 22 nm FD-SOI technology, where specific RF devices from the process design kit (PDK) provide direct access to the transistor substrate terminals for controlled noise injection. Multi-tone Harmonic Balance simulations in Spectre RF quantify the impact of TSV aggressors on the oscillator's output spectrum. The results indicate that an aggressor of 1 GHz, 0.5 V$_{pp}$ induces a primary sideband spur of -35.2 dBc. Sensitivity characterization reveals that the magnitude of these sideband spurs increases monotonically with the aggressor amplitude. Furthermore, frequency sweeps demonstrate a low-pass coupling response, where the induced spur magnitude decreases from -20.2 dBc at 500 MHz to -33.1 dBc at 2 GHz.

11.6ARApr 22
Enabling Mixed criticality applications for the Versal AI-Engines

Vincent Sprave, Martin Wilhelm, Daniele Passaretti et al.

Adaptive Systems-on-Chips (SoCs) are increasingly being used in mixed criticality systems (MCSs), such as in autonomous driving, aviation and medical systems. In this context, AMD has proposed the Versal SoC, which has a heterogeneous architecture including, among other components, an Artificial Intelligence Engine (AIE), which is a 2D array of processors and memory tiles designed for AI and signal processing workloads. While this AIE offers significant potential for accelerating real-time data processing tasks, this has not yet been explored in the context of MCSs since individual tasks with different criticality levels cannot be dynamically assigned to tiles due to the static mapping of dataflow graphs and tasks. In this work, we propose a dynamic task dispatching infrastructure that enables task switching on the AIE at runtime. Based on this infrastructure, we present an MCS design that dynamically assigns tasks of different criticality to a pool of AIE tiles, depending on the criticality mode of the system. Our approach overcomes the limitations of static dataflow graph mappings and, for the first time, exploits the parallel processing capabilities of the AIE for MCSs. We also present a comprehensive timing analysis of the overhead introduced by the task dispatcher infrastructure, focusing on control logic, context switching and data copy operations. This shows that these operations have low variance and are negligible compared to the overall execution time, demonstrating that our infrastructure is suitable for MCSs. Finally, we evaluate the proposed infrastructure using an autonomous driving workload with tasks that have variable execution times and different criticality levels. In this case study, we maximized AIE utilization, reducing idle time by 65.5 %, while measuring an execution time overhead of less than 0.002 %, and doubling the throughput of low-criticality tasks.

LGNov 9, 2023
Exploiting Neural-Network Statistics for Low-Power DNN Inference

Lennart Bamberg, Ardalan Najafi, Alberto Garcia-Ortiz

Specialized compute blocks have been developed for efficient DNN execution. However, due to the vast amount of data and parameter movements, the interconnects and on-chip memories form another bottleneck, impairing power and performance. This work addresses this bottleneck by contributing a low-power technique for edge-AI inference engines that combines overhead-free coding with a statistical analysis of the data and parameters of neural networks. Our approach reduces the interconnect and memory power consumption by up to 80% for state-of-the-art benchmarks while providing additional power savings for the compute blocks by up to 39%. These power improvements are achieved with no loss of accuracy and negligible hardware cost.

LGOct 13, 2025
Rescaling-Aware Training for Efficient Deployment of Deep Learning Models on Full-Integer Hardware

Lion Mueller, Alberto Garcia-Ortiz, Ardalan Najafi et al.

Integer AI inference significantly reduces computational complexity in embedded systems. Quantization-aware training (QAT) helps mitigate accuracy degradation associated with post-training quantization but still overlooks the impact of integer rescaling during inference, which is a hardware costly operation in integer-only AI inference. This work shows that rescaling cost can be dramatically reduced post-training, by applying a stronger quantization to the rescale multiplicands at no model-quality loss. Furthermore, we introduce Rescale-Aware Training, a fine tuning method for ultra-low bit-width rescaling multiplicands. Experiments show that even with 8x reduced rescaler widths, the full accuracy is preserved through minimal incremental retraining. This enables more energy-efficient and cost-efficient AI inference for resource-constrained embedded systems.

ARJun 1, 2025
VUSA: Virtually Upscaled Systolic Array Architecture to Exploit Unstructured Sparsity in AI Acceleration

Shereef Helal, Alberto Garcia-Ortiz, Lennart Bamberg

Leveraging high degrees of unstructured sparsity is a promising approach to enhance the efficiency of deep neural network DNN accelerators - particularly important for emerging Edge-AI applications. We introduce VUSA, a systolic-array architecture that virtually grows based on the present sparsity to perform larger matrix multiplications with the same number of physical multiply-accumulate MAC units. The proposed architecture achieves saving by 37% and 68% in area and power efficiency, respectively, at the same peak-performance, compared to a baseline systolic array architecture in a commercial 16-nm technology. Still, the proposed architecture supports acceleration for any DNN with any sparsity - even no sparsity at all. Thus, the proposed architecture is application-independent, making it viable for general-purpose AI acceleration.

LGMar 26, 2025
Including local feature interactions in deep non-negative matrix factorization networks improves performance

Mahbod Nouri, David Rotermund, Alberto Garcia-Ortiz et al.

The brain uses positive signals as a means of signaling. Forward interactions in the early visual cortex are also positive, realized by excitatory synapses. Only local interactions also include inhibition. Non-negative matrix factorization (NMF) captures the biological constraint of positive long-range interactions and can be implemented with stochastic spikes. While NMF can serve as an abstract formalization of early neural processing in the visual system, the performance of deep convolutional networks with NMF modules does not match that of CNNs of similar size. However, when the local NMF modules are each followed by a module that mixes the NMF's positive activities, the performances on the benchmark data exceed that of vanilla deep convolutional networks of similar size. This setting can be considered a biologically more plausible emulation of the processing in cortical (hyper-)columns with the potential to improve the performance of deep networks.