Venkata Kalyan Tavva

AR
4papers
Novelty45%
AI Score43

4 Papers

15.2ARApr 21
Efficient Page Migration in Hybrid Memory Systems

Upasna, Venkata Kalyan Tavva

Heterogeneous Memory Architecture (HMA) aims to optimize memory usage by leveraging a combination of memory types, such as high-bandwidth memory (HBM), commodity DRAM, and non-volatile memory (NVM), when utilized as main memory. To achieve maximum performance benefits, frequently accessed data pages are prioritized for storage in the faster HBM, while less frequently accessed pages are stored in slower memory types like DRAM or NVM. This enables a more efficient allocation of memory resources and improves overall system performance. In a Flat Address Space memory organization, all memory types, both fast and slow, are treated as a unified memory pool. This approach increases the overall memory capacity accessible to the system. In Flat Address Space organization, frequently accessed data pages may need to be remapped from slower memory to faster memory to improve memory access times. Such relocation requires changes to the data/states in the TLB (TLB shootdown) and the processor cache (cache line invalidations), leading to performance degradation. To address these inefficiencies, we propose a novel solution called Duon. The goal of Duon is to eliminate the overheads associated with page migration in systems using Extended TLB and Page Table. Specifically, our approach ensures that the updated mapping information for remapped pages is carefully stored directly in the TLB and page table itself. By doing so, the need for TLB shootdown and cache line invalidation after page migration is eliminated. Consequently, our proposal results in an overall improvement in IPC by 3.87% over existing state-of-the-art techniques, enhancing the efficiency and performance of heterogeneous memory systems. Further, our approach can work with any of the existing page migration policies and improve the performance.

37.7ARApr 20
Optimizing Branch Predictor for Graph Applications

Upasna, Venkata Kalyan Tavva

Real-world graph applications are generally larger than the size of the cache itself. Due to this reason, the memory hierarchy was identified as a key bottleneck by the earlier works. Undoubtedly, the performance can be achieved by improving cache, there is still a scope for performance gain by improving branch prediction accuracy. In graph processing applications, the occurrence of branch mispredictions is very frequent and is a major limitation for the overall performance. Within a program, there are different kinds of branches that recur throughout its execution. Although lots of branch predictors (BP) have been developed earlier to capture the static and dynamic behavior of branches. Branch predictors can yet be further optimized to handle the branches that cause mispredictions.

17.2ARMar 18
ReLMXEL: Adaptive RL-Based Memory Controller with Explainable Energy and Latency Optimization

Panuganti Chirag Sai, Gandholi Sarat, R. Raghunatha Sarma et al.

Reducing latency and energy consumption is critical to improving the efficiency of memory systems in modern computing. This work introduces ReLMXEL (Reinforcement Learning for Memory Controller with Explainable Energy and Latency Optimization), a explainable multi-agent online reinforcement learning framework that dynamically optimizes memory controller parameters using reward decomposition. ReLMXEL operates within the memory controller, leveraging detailed memory behavior metrics to guide decision-making. Experimental evaluations across diverse workloads demonstrate consistent performance gains over baseline configurations, with refinements driven by workload-specific memory access behaviour. By incorporating explainability into the learning process, ReLMXEL not only enhances performance but also increases the transparency of control decisions, paving the way for more accountable and adaptive memory system designs.

11.7CRApr 27
RowHammer Vulnerability Counter (RVC): Redefining RowHammer Detection with Victim-Centric Tracking

Lavi Jain, Venkata Kalyan Tavva

The Rowhammer vulnerability poses an increasing challenge with newer generations of DRAM and aggressive technology scaling. Existing mitigation techniques, such as Graphene, Twice, and Hydra, primarily rely on tracking activation counts for each row and issuing refreshes when a row reaches a predefined tracking threshold. However, these methods have inherent limitations, including inefficiencies in identifying rows genuinely at risk of bit flips. In this paper, we propose a novel framework called Rowhammer Vulnerability Count (RVC), which shifts the focus from activation count tracking to evaluating a row's actual vulnerability to bit flips. By selectively issuing refreshes only to rows on the verge of experiencing bit flips, RVC drastically reduces unnecessary refresh operations. We also demonstrate that prior works have incorrectly set tracking thresholds, leading to security flaws. Our evaluation shows that RVC achieves 95 - 99.99% improvement in mitigation induced refreshes when compared to Graphene, with no additional space overhead. Furthermore, RVC improves energy efficiency and reduces average LLC latency by up to 76.91%, making it a highly efficient and scalable solution for addressing Rowhammer in modern DRAM systems. These findings establish RVC as a superior approach for preventing Rowhammer, outperforming existing methods in both accuracy and efficiency.