Adrian Cristal

LG
3papers
90citations
Novelty38%
AI Score38

3 Papers

6.0ARApr 18
Different Perspectives of Memory System Simulation

Pouya Esmaili-Dokht, Arash Yadegari, Victor Xirau et al.

Memory simulators are used to estimate application performance on advanced memory systems, yet they may exhibit significant discrepancies compared to real hardware. This paper investigates two key questions: (1) what causes these inaccuracies, and (2) how can simulators be properly validated to ensure reliable performance predictions. We propose a methodology that evaluates memory performance from three complementary perspectives: the memory simulator, the CPU-memory interface, and the application. Our analysis reveals that these perspectives can diverge substantially, with application-level performance often decoupled from internal simulator statistics. We identify the CPU-memory interface as the primary source of these inaccuracies. To address these problems, we implement a set of corrections and enhancements that improve the fidelity of integrated simulators. We evaluate these changes across multiple widely used simulators, including Ramulator, Ramulator 2, and DRAMsim3 integrated with ZSim. The results show that correcting interface-related issues is essential to achieve simulation outcomes that closely resemble actual system performance.

LGDec 26, 2019
On the Resilience of Deep Learning for Reduced-voltage FPGAs

Kamyar Givaki, Behzad Salami, Reza Hojabr et al.

Deep Neural Networks (DNNs) are inherently computation-intensive and also power-hungry. Hardware accelerators such as Field Programmable Gate Arrays (FPGAs) are a promising solution that can satisfy these requirements for both embedded and High-Performance Computing (HPC) systems. In FPGAs, as well as CPUs and GPUs, aggressive voltage scaling below the nominal level is an effective technique for power dissipation minimization. Unfortunately, bit-flip faults start to appear as the voltage is scaled down closer to the transistor threshold due to timing issues, thus creating a resilience issue. This paper experimentally evaluates the resilience of the training phase of DNNs in the presence of voltage underscaling related faults of FPGAs, especially in on-chip memories. Toward this goal, we have experimentally evaluated the resilience of LeNet-5 and also a specially designed network for CIFAR-10 dataset with different activation functions of Rectified Linear Unit (Relu) and Hyperbolic Tangent (Tanh). We have found that modern FPGAs are robust enough in extremely low-voltage levels and that low-voltage related faults can be automatically masked within the training iterations, so there is no need for costly software- or hardware-oriented fault mitigation techniques like ECC. Approximately 10% more training iterations are needed to fill the gap in the accuracy. This observation is the result of the relatively low rate of undervolting faults, i.e., <0.1\%, measured on real FPGA fabrics. We have also increased the fault rate significantly for the LeNet-5 network by randomly generated fault injection campaigns and observed that the training accuracy starts to degrade. When the fault rate increases, the network with Tanh activation function outperforms the one with Relu in terms of accuracy, e.g., when the fault rate is 30% the accuracy difference is 4.92%.

LGJun 14, 2018
On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation

Behzad Salami, Osman Unsal, Adrian Cristal

Machine Learning (ML) is making a strong resurgence in tune with the massive generation of unstructured data which in turn requires massive computational resources. Due to the inherently compute- and power-intensive structure of Neural Networks (NNs), hardware accelerators emerge as a promising solution. However, with technology node scaling below 10nm, hardware accelerators become more susceptible to faults, which in turn can impact the NN accuracy. In this paper, we study the resilience aspects of Register-Transfer Level (RTL) model of NN accelerators, in particular, fault characterization and mitigation. By following a High-Level Synthesis (HLS) approach, first, we characterize the vulnerability of various components of RTL NN. We observed that the severity of faults depends on both i) application-level specifications, i.e., NN data (inputs, weights, or intermediate), NN layers, and NN activation functions, and ii) architectural-level specifications, i.e., data representation model and the parallelism degree of the underlying accelerator. Second, motivated by characterization results, we present a low-overhead fault mitigation technique that can efficiently correct bit flips, by 47.3% better than state-of-the-art methods.