Moinuddin K. Qureshi

h-index52

11papers

637citations

Novelty68%

AI Score44

Ranked #47,533 of 194,257 authors (top 24%)#1,048 in CR (top 15%)

11 Papers

19.5LGSep 12, 2022

Cocktail Party Attack: Breaking Aggregation-Based Privacy in Federated Learning using Independent Component Analysis

Sanjay Kariyappa, Chuan Guo, Kiwan Maeng et al.

Federated learning (FL) aims to perform privacy-preserving machine learning on distributed data held by multiple data owners. To this end, FL requires the data owners to perform training locally and share the gradient updates (instead of the private inputs) with the central server, which are then securely aggregated over multiple data owners. Although aggregation by itself does not provably offer privacy protection, prior work showed that it may suffice if the batch size is sufficiently large. In this paper, we propose the Cocktail Party Attack (CPA) that, contrary to prior belief, is able to recover the private inputs from gradients aggregated over a very large batch size. CPA leverages the crucial insight that aggregate gradients from a fully connected layer is a linear combination of its inputs, which leads us to frame gradient inversion as a blind source separation (BSS) problem (informally called the cocktail party problem). We adapt independent component analysis (ICA)--a classic solution to the BSS problem--to recover private inputs for fully-connected and convolutional networks, and show that CPA significantly outperforms prior gradient inversion attacks, scales to ImageNet-sized inputs, and works on large batch sizes of up to 1024.

5.1QUANT-PHOct 31, 2022

FrozenQubits: Boosting Fidelity of QAOA by Skipping Hotspot Nodes

Ramin Ayanzadeh, Narges Alavisamani, Poulami Das et al.

Quantum Approximate Optimization Algorithm (QAOA) is one of the leading candidates for demonstrating the quantum advantage using near-term quantum computers. Unfortunately, high device error rates limit us from reliably running QAOA circuits for problems with more than a few qubits. In QAOA, the problem graph is translated into a quantum circuit such that every edge corresponds to two 2-qubit CNOT operations in each layer of the circuit. As CNOTs are extremely error-prone, the fidelity of QAOA circuits is dictated by the number of edges in the problem graph. We observe that majority of graphs corresponding to real-world applications follow the ``power-law`` distribution, where some hotspot nodes have significantly higher number of connections. We leverage this insight and propose ``FrozenQubits`` that freezes the hotspot nodes or qubits and intelligently partitions the state-space of the given problem into several smaller sub-spaces which are then solved independently. The corresponding QAOA sub-circuits are significantly less vulnerable to gate and decoherence errors due to the reduced number of CNOT operations in each sub-circuit. Unlike prior circuit-cutting approaches, FrozenQubits does not require any exponentially complex post-processing step. Our evaluations with 5,300 QAOA circuits on eight different quantum computers from IBM shows that FrozenQubits can improve the quality of solutions by 8.73x on average (and by up to 57x), albeit utilizing 2x more quantum resources.

10.8QUANT-PHJan 17, 2024Code

Élivágar: Efficient Quantum Circuit Search for Classification

Sashwat Anagolum, Narges Alavisamani, Poulami Das et al.

Designing performant and noise-robust circuits for Quantum Machine Learning (QML) is challenging -- the design space scales exponentially with circuit size, and there are few well-supported guiding principles for QML circuit design. Although recent Quantum Circuit Search (QCS) methods attempt to search for performant QML circuits that are also robust to hardware noise, they directly adopt designs from classical Neural Architecture Search (NAS) that are misaligned with the unique constraints of quantum hardware, resulting in high search overheads and severe performance bottlenecks. We present Élivágar, a novel resource-efficient, noise-guided QCS framework. Élivágar innovates in all three major aspects of QCS -- search space, search algorithm and candidate evaluation strategy -- to address the design flaws in current classically-inspired QCS methods. Élivágar achieves hardware-efficiency and avoids an expensive circuit-mapping co-search via noise- and device topology-aware candidate generation. By introducing two cheap-to-compute predictors, Clifford noise resilience and Representational capacity, Élivágar decouples the evaluation of noise robustness and performance, enabling early rejection of low-fidelity circuits and reducing circuit evaluation costs. Due to its resource-efficiency, Élivágar can further search for data embeddings, significantly improving performance. Based on a comprehensive evaluation of Élivágar on 12 real quantum devices and 9 QML applications, Élivágar achieves 5.3% higher accuracy and a 271$\times$ speedup compared to state-of-the-art QCS methods.

6.6DCJun 17, 2025

Utility-Driven Speculative Decoding for Mixture-of-Experts

Anish Saxena, Po-An Tsai, Hritvik Taneja et al.

GPU memory bandwidth is the main bottleneck for low-latency Large Language Model (LLM) inference. Speculative decoding leverages idle GPU compute by using a lightweight drafter to propose K tokens, which the LLM verifies in parallel, boosting token throughput. In conventional dense LLMs, all model weights are fetched each iteration, so speculation adds no latency overhead. Emerging Mixture of Experts (MoE) models activate only a subset of weights per token, greatly reducing data movement. However, we show that speculation is ineffective for MoEs: draft tokens collectively activate more weights, increasing data movement and verification time by 2-3x. When token throughput gains fail to offset this overhead, speculation causes slowdowns up to 1.5x, making it infeasible. Even when useful, the optimal K varies by task, model, and even between requests and iterations. Thus, despite widespread use in dense LLMs, speculation remains impractical in leading MoEs. We present Cascade, a utility-driven framework that selectively enables speculation to avoid slowdowns and dynamically tunes K to accelerate MoE serving. Cascade uses a lightweight metric, speculation utility, the ratio of token gains to verification cost, which shows iteration-level locality, enabling periodic decisions via short test and longer set phases. For each request, Cascade disables speculation if utility drops below one during testing, and when utility exceeds one, tests multiple K-values to choose the utility-maximizing K for the set phase. We implement Cascade in vLLM and evaluate it on five popular MoEs with workloads spanning code, math, extraction, and mixed tasks. Cascade limits slowdown to 5% (vs. 1.5x) and improves throughput by 7-14% over static K, making speculative decoding practical for MoEs.

17.9CRNov 25, 2021

ExPLoit: Extracting Private Labels in Split Learning

Sanjay Kariyappa, Moinuddin K Qureshi

Split learning is a popular technique used for vertical federated learning (VFL), where the goal is to jointly train a model on the private input and label data held by two parties. This technique uses a split-model, trained end-to-end, by exchanging the intermediate representations (IR) of the inputs and gradients of the IR between the two parties. We propose ExPLoit - a label-leakage attack that allows an adversarial input-owner to extract the private labels of the label-owner during split-learning. ExPLoit frames the attack as a supervised learning problem by using a novel loss function that combines gradient-matching and several regularization terms developed using key properties of the dataset and models. Our evaluations show that ExPLoit can uncover the private labels with near-perfect accuracy of up to 99.96%. Our findings underscore the need for better training techniques for VFL.

3.8CRApr 6, 2021

Enabling Inference Privacy with Adaptive Noise Injection

Sanjay Kariyappa, Ousmane Dia, Moinuddin K Qureshi

User-facing software services are becoming increasingly reliant on remote servers to host Deep Neural Network (DNN) models, which perform inference tasks for the clients. Such services require the client to send input data to the service provider, who processes it using a DNN and returns the output predictions to the client. Due to the rich nature of the inputs such as images and speech, the input often contains more information than what is necessary to perform the primary inference task. Consequently, in addition to the primary inference task, a malicious service provider could infer secondary (sensitive) attributes from the input, compromising the client's privacy. The goal of our work is to improve inference privacy by injecting noise to the input to hide the irrelevant features that are not conducive to the primary classification task. To this end, we propose Adaptive Noise Injection (ANI), which uses a light-weight DNN on the client-side to inject noise to each input, before transmitting it to the service provider to perform inference. Our key insight is that by customizing the noise to each input, we can achieve state-of-the-art trade-off between utility and privacy (up to 48.5% degradation in sensitive-task accuracy with <1% degradation in primary accuracy), significantly outperforming existing noise injection schemes. Our method does not require prior knowledge of the sensitive attributes and incurs minimal computational overheads.

11.5CRSep 18, 2020Code

MIRAGE: Mitigating Conflict-Based Cache Attacks with a Practical Fully-Associative Design

Gururaj Saileshwar, Moinuddin Qureshi

Shared processor caches are vulnerable to conflict-based side-channel attacks, where an attacker can monitor access patterns of a victim by evicting victim cache lines using cache-set conflicts. Recent mitigations propose randomized mapping of addresses to cache lines to obfuscate the locations of set-conflicts. However, these are vulnerable to new attacks that discover conflicting sets of addresses despite such mitigations, because these designs select eviction-candidates from a small set of conflicting lines. This paper presents Mirage, a practical design for a fully associative cache, wherein eviction candidates are selected randomly from all lines resident in the cache, to be immune to set-conflicts. A key challenge for enabling such designs in large shared caches (containing tens of thousands of cache lines) is the complexity of cache-lookup, as a naive design can require searching through all the resident lines. Mirage achieves full-associativity while retaining practical set-associative lookups by decoupling placement and replacement, using pointer-based indirection from tag-store to data-store to allow a newly installed address to globally evict the data of any random resident line. To eliminate set-conflicts, Mirage provisions extra invalid tags in a skewed-associative tag-store design where lines can be installed without set-conflict, along with a load-aware skew-selection policy that guarantees the availability of sets with invalid tags. Our analysis shows Mirage provides the global eviction property of a fully-associative cache throughout system lifetime (violations of full-associativity, i.e. set-conflicts, occur less than once in 10^4 to 10^17 years), thus offering a principled defense against any eviction-set discovery and any potential conflict based attacks. Mirage incurs limited slowdown (2%) and 17-20% extra storage compared to a non-secure cache.

27.4MLMay 6, 2020Code

MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation

Sanjay Kariyappa, Atul Prakash, Moinuddin Qureshi

Model Stealing (MS) attacks allow an adversary with black-box access to a Machine Learning model to replicate its functionality, compromising the confidentiality of the model. Such attacks train a clone model by using the predictions of the target model for different inputs. The effectiveness of such attacks relies heavily on the availability of data necessary to query the target model. Existing attacks either assume partial access to the dataset of the target model or availability of an alternate dataset with semantic similarities. This paper proposes MAZE -- a data-free model stealing attack using zeroth-order gradient estimation. In contrast to prior works, MAZE does not require any data and instead creates synthetic data using a generative model. Inspired by recent works in data-free Knowledge Distillation (KD), we train the generative model using a disagreement objective to produce inputs that maximize disagreement between the clone and the target model. However, unlike the white-box setting of KD, where the gradient information is available, training a generator for model stealing requires performing black-box optimization, as it involves accessing the target model under attack. MAZE relies on zeroth-order gradient estimation to perform this optimization and enables a highly accurate MS attack. Our evaluation with four datasets shows that MAZE provides a normalized clone accuracy in the range of 0.91x to 0.99x, and outperforms even the recent attacks that rely on partial data (JBDA, clone accuracy 0.13x to 0.69x) and surrogate data (KnockoffNets, clone accuracy 0.52x to 0.97x). We also study an extension of MAZE in the partial-data setting and develop MAZE-PD, which generates synthetic data closer to the target distribution. MAZE-PD further improves the clone accuracy (0.97x to 1.0x) and reduces the query required for the attack by 2x-24x.

22.1MLNov 16, 2019Code

Defending Against Model Stealing Attacks with Adaptive Misinformation

Sanjay Kariyappa, Moinuddin K Qureshi

Deep Neural Networks (DNNs) are susceptible to model stealing attacks, which allows a data-limited adversary with no knowledge of the training dataset to clone the functionality of a target model, just by using black-box query access. Such attacks are typically carried out by querying the target model using inputs that are synthetically generated or sampled from a surrogate dataset to construct a labeled dataset. The adversary can use this labeled dataset to train a clone model, which achieves a classification accuracy comparable to that of the target model. We propose "Adaptive Misinformation" to defend against such model stealing attacks. We identify that all existing model stealing attacks invariably query the target model with Out-Of-Distribution (OOD) inputs. By selectively sending incorrect predictions for OOD queries, our defense substantially degrades the accuracy of the attacker's clone model (by up to 40%), while minimally impacting the accuracy (<0.5%) for benign users. Compared to existing defenses, our defense has a significantly better security vs accuracy trade-off and incurs minimal computational overhead.

6.8CRJun 6, 2019

Lookout for Zombies: Mitigating Flush+Reload Attack on Shared Caches by Monitoring Invalidated Lines

Gururaj Saileshwar, Moinuddin K. Qureshi

OS-based page sharing is a commonly used optimization in modern systems to reduce memory footprint. Unfortunately, such sharing can cause Flush+Reload cache attacks, whereby a spy periodically flushes a cache line of shared data (using the clflush instruction) and reloads it to infer the access patterns of a victim application. Current proposals to mitigate Flush+Reload attacks are impractical as they either disable page sharing, or require application rewrite, or require OS support, or incur ISA changes. Ideally, we want to tolerate attacks without requiring any OS or ISA support and while incurring negligible performance and storage overheads. This paper makes the key observation that when a cache line is invalidated due to a Flush-Caused Invalidation (FCI), the tag and data of the invalidated line are still resident in the cache and can be used for detecting Flush-based attacks. We call lines invalidated due to FCI as Zombie lines. Our design explicitly marks such lines as Zombies, preserves the Zombie lines in the cache, and uses the hits and misses to Zombie lines to tolerate the attacks. We propose Zombie-Based Mitigation (ZBM), a simple hardware-based design that successfully guards against attacks by simply treating hits on Zombie-lines as misses to avoid any timing leaks to the attacker. We analyze the robustness of ZBM using three spy programs: attacking AES T-Tables, attacking RSA Square-and-Multiply, and Function Watcher (FW), and show that ZBM successfully defends against these attacks. Our solution requires negligible storage (4-bits per cache line), retains OS-based page sharing, requires no OS/ISA changes, and does not incur slowdown for benign applications.

28.8MLJan 28, 2019

Improving Adversarial Robustness of Ensembles with Diversity Training

Sanjay Kariyappa, Moinuddin K. Qureshi

Deep Neural Networks are vulnerable to adversarial attacks even in settings where the attacker has no direct access to the model being attacked. Such attacks usually rely on the principle of transferability, whereby an attack crafted on a surrogate model tends to transfer to the target model. We show that an ensemble of models with misaligned loss gradients can provide an effective defense against transfer-based attacks. Our key insight is that an adversarial example is less likely to fool multiple models in the ensemble if their loss functions do not increase in a correlated fashion. To this end, we propose Diversity Training, a novel method to train an ensemble of models with uncorrelated loss functions. We show that our method significantly improves the adversarial robustness of ensembles and can also be combined with existing methods to create a stronger defense.