ARJul 7, 2025
Accelerating GenAI Workloads by Enabling RISC-V Microkernel Support in IREEAdeel Ahmad, Ahmad Tameem Kamal, Nouman Amir et al.
This project enables RISC-V microkernel support in IREE, an MLIR-based machine learning compiler and runtime. The approach begins by enabling the lowering of MLIR linalg dialect contraction ops to linalg.mmt4d op for the RISC-V64 target within the IREE pass pipeline, followed by the development of optimized microkernels for RISC-V. The performance gains are compared with upstream IREE and Llama.cpp for the Llama-3.2-1B-Instruct model.
AIMay 12, 2025
QuantX: A Framework for Hardware-Aware Quantization of Generative AI WorkloadsMuhammad Ahmad, Khurram Mazher, Saqib Akram et al.
We present QuantX: a tailored suite of recipes for LLM and VLM quantization. It is capable of quantizing down to 3-bit resolutions with minimal loss in performance. The quantization strategies in QuantX take into account hardware-specific constraints to achieve efficient dequantization during inference ensuring flexible trade-off between runtime speed, memory requirement and model accuracy. Our results demonstrate that QuantX achieves performance within 6% of the unquantized model for LlaVa-v1.6 quantized down to 3-bits for multiple end user tasks and outperforms recently published state-of-the-art quantization techniques. We further integrate one particular technique from QuantX into the popular Llama.cpp framework and show its feasibility in terms of runtime compared to the mainstream quantization techniques from Llama.cpp. Lastly, this manuscript provides insights into the LLM quantization process that motivated the range of recipes and options that are incorporated in QuantX.
CRMar 30, 2017
High Efficiency Power Side-Channel Attack Immunity using Noise Injection in Attenuated Signature DomainDebayan Das, Shovan Maity, Saad Bin Nasir et al.
With the advancement of technology in the last few decades, leading to the widespread availability of miniaturized sensors and internet-connected things (IoT), security of electronic devices has become a top priority. Side-channel attack (SCA) is one of the prominent methods to break the security of an encryption system by exploiting the information leaked from the physical devices. Correlational power attack (CPA) is an efficient power side-channel attack technique, which analyses the correlation between the estimated and measured supply current traces to extract the secret key. The existing countermeasures to the power attacks are mainly based on reducing the SNR of the leaked data, or introducing large overhead using techniques like power balancing. This paper presents an attenuated signature AES (AS-AES), which resists SCA with minimal noise current overhead. AS-AES uses a shunt low-drop-out (LDO) regulator to suppress the AES current signature by 400x in the supply current traces. The shunt LDO has been fabricated and validated in 130 nm CMOS technology. System-level implementation of the AS-AES along with noise injection, shows that the system remains secure even after 50K encryptions, with 10x reduction in power overhead compared to that of noise addition alone.