Ruichuan Chen

DC
h-index14
7papers
63citations
Novelty57%
AI Score47

7 Papers

DCMay 4, 2022Code
SMLT: A Serverless Framework for Scalable and Adaptive Machine Learning Design and Training

Ahsan Ali, Syed Zawad, Paarijaat Aditya et al.

In today's production machine learning (ML) systems, models are continuously trained, improved, and deployed. ML design and training are becoming a continuous workflow of various tasks that have dynamic resource demands. Serverless computing is an emerging cloud paradigm that provides transparent resource management and scaling for users and has the potential to revolutionize the routine of ML design and training. However, hosting modern ML workflows on existing serverless platforms has non-trivial challenges due to their intrinsic design limitations such as stateless nature, limited communication support across function instances, and limited function execution duration. These limitations result in a lack of an overarching view and adaptation mechanism for training dynamics and an amplification of existing problems in ML workflows. To address the above challenges, we propose SMLT, an automated, scalable, and adaptive serverless framework to enable efficient and user-centric ML design and training. SMLT employs an automated and adaptive scheduling mechanism to dynamically optimize the deployment and resource scaling for ML tasks during training. SMLT further enables user-centric ML workflow execution by supporting user-specified training deadlines and budget limits. In addition, by providing an end-to-end design, SMLT solves the intrinsic problems in serverless platforms such as the communication overhead, limited function execution duration, need for repeated initialization, and also provides explicit fault tolerance for ML training. SMLT is open-sourced and compatible with all major ML frameworks. Our experimental evaluation with large, sophisticated modern ML models demonstrate that SMLT outperforms the state-of-the-art VM based systems and existing serverless ML training frameworks in both training speed (up to 8X) and monetary cost (up to 3X)

DCApr 28
Janus: Disaggregating Attention and Experts for Scalable MoE Inference

Zhexiang Zhang, Ye Wang, Yumiao Zhao et al.

Serving large Mixture-of-Experts (MoE) models is challenging because of their large memory footprints, heterogeneous resource demands, and highly dynamic inference workloads. Most existing MoE inference systems deploy the entire model as a monolithic unit, forcing attention and MoE layers to share the same resource configuration despite their different scaling behaviors and resource bottlenecks. Such coarse-grained provisioning leads to resource inefficiency and suboptimal performance. We present JANUS, a scalable and resource-efficient MoE inference system built around three key principles. First, JANUS disaggregates attention and MoE layers onto separate GPU worker pools, enabling independent resource provisioning for the two layer types, and uses an adaptive two-phase communication mechanism for low-latency data exchange. Second, because MoE-layer execution is often memory-bound and highly sensitive to activated-expert imbalance, JANUS introduces a lightweight, microsecond-scale activation scheduler that balances per-layer activated experts across MoE instances to reduce inference latency. Third, JANUS employs a fine-grained, SLO-aware resource scaling scheme that jointly selects attention resources, MoE resources, and expert placement to minimize GPU cost under token-level SLOs. Evaluation shows that JANUS improves per-GPU throughput by up to 4.7x over state-of-the-art MoE inference baselines while satisfying token-level latency SLOs.

CRAug 4, 2024
Model Hijacking Attack in Federated Learning

Zheng Li, Siyuan Wu, Ruichuan Chen et al.

Machine learning (ML), driven by prominent paradigms such as centralized and federated learning, has made significant progress in various critical applications ranging from autonomous driving to face recognition. However, its remarkable success has been accompanied by various attacks. Recently, the model hijacking attack has shown that ML models can be hijacked to execute tasks different from their original tasks, which increases both accountability and parasitic computational risks. Nevertheless, thus far, this attack has only focused on centralized learning. In this work, we broaden the scope of this attack to the federated learning domain, where multiple clients collaboratively train a global model without sharing their data. Specifically, we present HijackFL, the first-of-its-kind hijacking attack against the global model in federated learning. The adversary aims to force the global model to perform a different task (called hijacking task) from its original task without the server or benign client noticing. To accomplish this, unlike existing methods that use data poisoning to modify the target model's parameters, HijackFL searches for pixel-level perturbations based on their local model (without modifications) to align hijacking samples with the original ones in the feature space. When performing the hijacking task, the adversary applies these cloaks to the hijacking samples, compelling the global model to identify them as original samples and predict them accordingly. We conduct extensive experiments on four benchmark datasets and three popular models. Empirical results demonstrate that its attack performance outperforms baselines. We further investigate the factors that affect its performance and discuss possible defenses to mitigate its impact.

CRSep 29, 2025
Defeating Cerberus: Concept-Guided Privacy-Leakage Mitigation in Multimodal Language Models

Boyang Zhang, Istemi Ekin Akkus, Ruichuan Chen et al.

Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in processing and reasoning over diverse modalities, but their advanced abilities also raise significant privacy concerns, particularly regarding Personally Identifiable Information (PII) leakage. While relevant research has been conducted on single-modal language models to some extent, the vulnerabilities in the multimodal setting have yet to be fully investigated. In this work, we investigate these emerging risks with a focus on vision language models (VLMs), a representative subclass of MLLMs that covers the two modalities most relevant for PII leakage, vision and text. We introduce a concept-guided mitigation approach that identifies and modifies the model's internal states associated with PII-related content. Our method guides VLMs to refuse PII-sensitive tasks effectively and efficiently, without requiring re-training or fine-tuning. We also address the current lack of multimodal PII datasets by constructing various ones that simulate real-world scenarios. Experimental results demonstrate that the method can achieve an average refusal rate of 93.3% for various PII-related tasks with minimal impact on unrelated model performances. We further examine the mitigation's performance under various conditions to show the adaptability of our proposed method.

DCDec 11, 2024
Protecting Confidentiality, Privacy and Integrity in Collaborative Learning

Dong Chen, Alice Dethise, Istemi Ekin Akkus et al.

A collaboration between dataset owners and model owners is needed to facilitate effective machine learning (ML) training. During this collaboration, however, dataset owners and model owners want to protect the confidentiality of their respective assets (i.e., datasets, models and training code), with the dataset owners also caring about the privacy of individual users whose data is in their datasets. Existing solutions either provide limited confidentiality for models and training code, or suffer from privacy issues due to collusion. We present Citadel++, a collaborative ML training system designed to simultaneously protect the confidentiality of datasets, models and training code as well as the privacy of individual users. Citadel++ enhances differential privacy mechanisms to safeguard the privacy of individual user data while maintaining model utility. By employing Virtual Machine-level Trusted Execution Environments (TEEs) as well as the improved sandboxing and integrity mechanisms through OS-level techniques, Citadel++ effectively preserves the confidentiality of datasets, models and training code, and enforces our privacy mechanisms even when the models and training code have been maliciously designed. Our experiments show that Citadel++ provides model utility and performance while adhering to the confidentiality and privacy requirements of dataset owners and model owners, outperforming the state-of-the-art privacy-preserving training systems by up to 543x on CPU and 113x on GPU TEEs.

CRMay 4, 2021
Citadel: Protecting Data Privacy and Model Confidentiality for Collaborative Learning with SGX

Chengliang Zhang, Junzhe Xia, Baichen Yang et al.

With the advancement of machine learning (ML) and its growing awareness, many organizations who own data but not ML expertise (data owner) would like to pool their data and collaborate with those who have expertise but need data from diverse sources to train truly generalizable models (model owner). In such collaborative ML, the data owner wants to protect the privacy of its training data, while the model owner desires the confidentiality of the model and the training method which may contain intellectual properties. However, existing private ML solutions, such as federated learning and split learning, cannot meet the privacy requirements of both data and model owners at the same time. This paper presents Citadel, a scalable collaborative ML system that protects the privacy of both data owner and model owner in untrusted infrastructures with the help of Intel SGX. Citadel performs distributed training across multiple training enclaves running on behalf of data owners and an aggregator enclave on behalf of the model owner. Citadel further establishes a strong information barrier between these enclaves by means of zero-sum masking and hierarchical aggregation to prevent data/model leakage during collaborative training. Compared with the existing SGX-protected training systems, Citadel enables better scalability and stronger privacy guarantees for collaborative ML. Cloud deployment with various ML models shows that Citadel scales to a large number of enclaves with less than 1.73X slowdown caused by SGX.

DCJan 19, 2017
Privacy Preserving Stream Analytics: The Marriage of Randomized Response and Approximate Computing

Do Le Quoc, Martin Beck, Pramod Bhatotia et al.

How to preserve users' privacy while supporting high-utility analytics for low-latency stream processing? To answer this question: we describe the design, implementation, and evaluation of PRIVAPPROX, a data analytics system for privacy-preserving stream processing. PRIVAPPROX provides three properties: (i) Privacy: zero-knowledge privacy guarantees for users, a privacy bound tighter than the state-of-the-art differential privacy; (ii) Utility: an interface for data analysts to systematically explore the trade-offs between the output accuracy (with error-estimation) and query execution budget; (iii) Latency: near real-time stream processing based on a scalable "synchronization-free" distributed architecture. The key idea behind our approach is to marry two existing techniques together: namely, sampling (used in the context of approximate computing) and randomized response (used in the context of privacy-preserving analytics). The resulting marriage is complementary - it achieves stronger privacy guarantees and also improves performance, a necessary ingredient for achieving low-latency stream analytics.