Guru Venkataramani

CR
h-index24
12papers
161citations
Novelty48%
AI Score27

12 Papers

LGOct 12, 2023
Every Parameter Matters: Ensuring the Convergence of Federated Learning with Dynamic Heterogeneous Models Reduction

Hanhan Zhou, Tian Lan, Guru Venkataramani et al.

Cross-device Federated Learning (FL) faces significant challenges where low-end clients that could potentially make unique contributions are excluded from training large models due to their resource bottlenecks. Recent research efforts have focused on model-heterogeneous FL, by extracting reduced-size models from the global model and applying them to local clients accordingly. Despite the empirical success, general theoretical guarantees of convergence on this method remain an open question. This paper presents a unifying framework for heterogeneous FL algorithms with online model extraction and provides a general convergence analysis for the first time. In particular, we prove that under certain sufficient conditions and for both IID and non-IID data, these algorithms converge to a stationary point of standard FL for general smooth cost functions. Moreover, we introduce the concept of minimum coverage index, together with model reduction noise, which will determine the convergence of heterogeneous federated learning, and therefore we advocate for a holistic approach that considers both factors to enhance the efficiency of heterogeneous federated learning.

LGFeb 21, 2023
MAC-PO: Multi-Agent Experience Replay via Collective Priority Optimization

Yongsheng Mei, Hanhan Zhou, Tian Lan et al.

Experience replay is crucial for off-policy reinforcement learning (RL) methods. By remembering and reusing the experiences from past different policies, experience replay significantly improves the training efficiency and stability of RL algorithms. Many decision-making problems in practice naturally involve multiple agents and require multi-agent reinforcement learning (MARL) under centralized training decentralized execution paradigm. Nevertheless, existing MARL algorithms often adopt standard experience replay where the transitions are uniformly sampled regardless of their importance. Finding prioritized sampling weights that are optimized for MARL experience replay has yet to be explored. To this end, we propose MAC-PO, which formulates optimal prioritized experience replay for multi-agent problems as a regret minimization over the sampling weights of transitions. Such optimization is relaxed and solved using the Lagrangian multiplier approach to obtain the close-form optimal sampling weights. By minimizing the resulting policy regret, we can narrow the gap between the current policy and a nominal optimal policy, thus acquiring an improved prioritization scheme for multi-agent tasks. Our experimental results on Predator-Prey and StarCraft Multi-Agent Challenge environments demonstrate the effectiveness of our method, having a better ability to replay important transitions and outperforming other state-of-the-art baselines.

CVFeb 6, 2023
Exploiting Partial Common Information Microstructure for Multi-Modal Brain Tumor Segmentation

Yongsheng Mei, Guru Venkataramani, Tian Lan

Learning with multiple modalities is crucial for automated brain tumor segmentation from magnetic resonance imaging data. Explicitly optimizing the common information shared among all modalities (e.g., by maximizing the total correlation) has been shown to achieve better feature representations and thus enhance the segmentation performance. However, existing approaches are oblivious to partial common information shared by subsets of the modalities. In this paper, we show that identifying such partial common information can significantly boost the discriminative power of image segmentation models. In particular, we introduce a novel concept of partial common information mask (PCI-mask) to provide a fine-grained characterization of what partial common information is shared by which subsets of the modalities. By solving a masked correlation maximization and simultaneously learning an optimal PCI-mask, we identify the latent microstructure of partial common information and leverage it in a self-attention module to selectively weight different feature representations in multi-modal data. We implement our proposed framework on the standard U-Net. Our experimental results on the Multi-modal Brain Tumor Segmentation Challenge (BraTS) datasets outperform those of state-of-the-art segmentation baselines, with validation Dice similarity coefficients of 0.920, 0.897, 0.837 for the whole tumor, tumor core, and enhancing tumor on BraTS-2020.

LGMay 7, 2024
SwiftRL: Towards Efficient Reinforcement Learning on Real Processing-In-Memory Systems

Kailash Gogineni, Sai Santosh Dayapule, Juan Gómez-Luna et al.

Reinforcement Learning (RL) trains agents to learn optimal behavior by maximizing reward signals from experience datasets. However, RL training often faces memory limitations, leading to execution latencies and prolonged training times. To overcome this, SwiftRL explores Processing-In-Memory (PIM) architectures to accelerate RL workloads. We achieve near-linear performance scaling by implementing RL algorithms like Tabular Q-learning and SARSA on UPMEM PIM systems and optimizing for hardware. Our experiments on OpenAI GYM environments using UPMEM hardware demonstrate superior performance compared to CPU and GPU implementations.

CRFeb 1, 2022
A Framework for Server Authentication using Communication Protocol Dialects

Kailash Gogineni, Yongsheng Mei, Guru Venkataramani et al.

In today's world, computer networks have become vulnerable to numerous attacks. In both wireless and wired networks, one of the most common attacks is man-in-the-middle attacks, within which session hijacking, context confusion attacks have been the most attempted. A potential attacker may have enough time to launch an attack targeting these vulnerabilities (such as rerouting the target request to a malicious server or hijacking the traffic). A viable strategy to solve this problem is, by dynamically changing the system properties, configurations and create unique fingerprints to identify the source. However, the existing work of fingerprinting mainly focuses on lower-level properties (e.g IP address), and only these types of properties are restricted for mutation. We develop a novel system, called Verify-Pro, to provide server authentication using communication protocol dialects, that uses a client-server architecture based on network protocols for customizing the communication transactions. For each session, a particular sequence of handshakes will be used as dialects. So, given the context, with the establishment of a one-time username and password, we use the dialects as an authentication mechanism for each request (e.g get filename in FTP) throughout the session, which enforces continuous authentication. Specifically, we leverage a machine learning approach on both client and server machines to trigger a specific dialect that dynamically changes for each request. We implement a prototype of Verify-Pro and evaluate its practicality on standard communication protocols FTP, HTTP & internet of things protocol MQTT. Our experimental results show that by sending misleading information through message packets from an attacker at the application layer, it is possible for the recipient to identify if the sender is genuine or a spoofed one, with a negligible overhead of 0.536%.

LGJan 27, 2022
On the Convergence of Heterogeneous Federated Learning with Arbitrary Adaptive Online Model Pruning

Hanhan Zhou, Tian Lan, Guru Venkataramani et al.

One of the biggest challenges in Federated Learning (FL) is that client devices often have drastically different computation and communication resources for local updates. To this end, recent research efforts have focused on training heterogeneous local models obtained by pruning a shared global model. Despite empirical success, theoretical guarantees on convergence remain an open question. In this paper, we present a unifying framework for heterogeneous FL algorithms with {\em arbitrary} adaptive online model pruning and provide a general convergence analysis. In particular, we prove that under certain sufficient conditions and on both IID and non-IID data, these algorithms converges to a stationary point of standard FL for general smooth cost functions, with a convergence rate of $O(\frac{1}{\sqrt{Q}})$. Moreover, we illuminate two key factors impacting convergence: pruning-induced noise and minimum coverage index, advocating a joint design of local pruning masks for efficient training.

CVNov 23, 2021
PT-VTON: an Image-Based Virtual Try-On Network with Progressive Pose Attention Transfer

Hanhan Zhou, Tian Lan, Guru Venkataramani

The virtual try-on system has gained great attention due to its potential to give customers a realistic, personalized product presentation in virtualized settings. In this paper, we present PT-VTON, a novel pose-transfer-based framework for cloth transfer that enables virtual try-on with arbitrary poses. PT-VTON can be applied to the fashion industry within minimal modification of existing systems while satisfying the overall visual fashionability and detailed fabric appearance requirements. It enables efficient clothes transferring between model and user images with arbitrary pose and body shape. We implement a prototype of PT-VTON and demonstrate that our system can match or surpass many other approaches when facing a drastic variation of poses by preserving detailed human and fabric characteristic appearances. PT-VTON is shown to outperform alternative approaches both on machine-based quantitative metrics and qualitative results.

CROct 7, 2021
MPD: Moving Target Defense through Communication Protocol Dialects

Yongsheng Mei, Kailash Gogineni, Tian Lan et al.

Communication protocol security is among the most significant challenges of the Internet of Things (IoT) due to the wide variety of hardware and software technologies involved. Moving target defense (MTD) has been adopted as an innovative strategy to solve this problem by dynamically changing target system properties and configurations to obfuscate the attack surface. Nevertheless, the existing work of MTD primarily focuses on lower-level properties (e.g., IP addresses or port numbers), and only a limited number of variations can be generated based on these properties. In this paper, we propose a new approach of MTD through communication protocol dialects (MPD) - which dynamically customizes a communication protocol into various protocol dialects and leverages them to create a moving target defense. Specifically, MPD harnesses a dialect generating function to create protocol dialects and then a mapping function to select one specific dialect for each packet during communication. To keep different network entities in synchronization, we also design a self-synchronization mechanism utilizing a pseudo-random number generator with the input of a pre-shared secret key and previously sent packets. We implement a prototype of MPD and evaluate its feasibility on standard network protocol (i.e., File Transfer Protocol) and internet of things protocol (i.e., Message Queuing Telemetry Transport). The results indicate that MPD can create a moving target defense with protocol dialects to effectively address various attacks - including the denial of service attack and malicious packet modifications - with negligible overhead.

SEFeb 24, 2021
Integrated Reasoning Engine for Pointer-related Code Clone Detection

Hongfa Xue, Yongsheng Mei, Kailash Gogineni et al.

Detecting similar code fragments, usually referred to as code clones, is an important task. In particular, code clone detection can have significant uses in the context of vulnerability discovery, refactoring and plagiarism detection. However, false positives are inevitable and always require manual reviews. In this paper, we propose Twin-Finder+, a novel closed-loop approach for pointer-related code clone detection that integrates machine learning and symbolic execution techniques to achieve precision. Twin-Finder+ introduces a formal verification mechanism to automate such manual reviews process. Our experimental results show Twin-Finder+ that can remove 91.69% false positives in average. We further conduct security analysis for memory safety using real-world applications, Links version 2.14 and libreOffice-6.0.0.1. Twin-Finder+ is able to find 6 unreported bugs in Links version 2.14 and one public patched bug in libreOffice-6.0.0.1.

SENov 1, 2019
Twin-Finder: Integrated Reasoning Engine for Pointer-related Code Clone Detection

Hongfa Xue, Yongsheng Mei, Kailash Gogineni et al.

Detecting code clones is crucial in various software engineering tasks. In particular, code clone detection can have significant uses in the context of analyzing and fixing bugs in large scale applications. However, prior works, such as machine learning-based clone detection, may cause a considerable amount of false positives. In this paper, we propose Twin-Finder, a novel, closed-loop approach for pointer-related code clone detection that integrates machine learning and symbolic execution techniques to achieve precision. Twin-Finder introduces a clone verification mechanism to formally verify if two clone samples are indeed clones and a feedback loop to automatically generated formal rules to tune machine learning algorithm and further reduce the false positives. Our experimental results show that Twin-Finder can swiftly identify up 9X more code clones comparing to a tree-based clone detector, Deckard and remove an average 91.69% false positives.

CRFeb 13, 2019
Towards a Better Indicator for Cache Timing Channels

Fan Yao, Hongyu Fang, Milos Doroslovacki et al.

Recent studies highlighting the vulnerability of computer architecture to information leakage attacks have been a cause of significant concern. Among the various classes of microarchitectural attacks, cache timing channels are especially worrisome since they have the potential to compromise users' private data at high bit rates. Prior works have demonstrated the use of cache miss patterns to detect these attacks. We find that cache miss traces can be easily spoofed and thus they may not be able to identify smarter adversaries. In this work, we show that \emph{cache occupancy}, which records the number of cache blocks owned by a specific process, can be leveraged as a stronger indicator for the presence of cache timing channels. We observe that the modulation of cache access latency in timing channels can be recognized through analyzing pairwise cache occupancy patterns. Our experimental results show that cache occupancy patterns cannot be easily obfuscated even by advanced adversaries that successfully evade cache miss-based detection.

CRFeb 9, 2019
Architecting Non-Volatile Main Memory to Guard Against Persistence-based Attacks

Fan Yao, Guru Venkataramani

DRAM-based main memory and its associated components increasingly account for a significant portion of application performance bottlenecks and power budget demands inside the computing ecosystem. To alleviate the problems of storage density and power constraints associated with DRAM, system architects are investigating alternative non-volatile memory technologies such as Phase Change Memory (PCM) to either replace or be used alongside DRAM memory. While such alternative memory types offer many promises to overcome the DRAM-related issues, they present a significant security threat to the users due to persistence of memory data even after power down. In this paper, we investigate smart mechanisms to obscure the data left in non-volatile memory after power down. In particular, we analyze the effect of using a single encryption algorithm versus differentiated encryption based on the security needs of the application phases. We also explore the effect of encryption on a hybrid main memory that has a DRAM buffer cache plus PCM main memory. Our mechanism takes into account the limited write endurance problem associated with several non-volatile memory technologies including PCM, and avoids any additional writes beyond those originally issued by the applications. We evaluate using Gem5 simulator and SPEC 2006 applications, and show the performance and power overheads of our proposed design.