Sagar Sharma

CR
6papers
223citations
Novelty32%
AI Score21

6 Papers

CRDec 31, 2022
A Comparative Study of Image Disguising Methods for Confidential Outsourced Learning

Sagar Sharma, Yuechun Gu, Keke Chen

Large training data and expensive model tweaking are standard features of deep learning for images. As a result, data owners often utilize cloud resources to develop large-scale complex models, which raises privacy concerns. Existing solutions are either too expensive to be practical or do not sufficiently protect the confidentiality of data and models. In this paper, we study and compare novel \emph{image disguising} mechanisms, DisguisedNets and InstaHide, aiming to achieve a better trade-off among the level of protection for outsourced DNN model training, the expenses, and the utility of data. DisguisedNets are novel combinations of image blocktization, block-level random permutation, and two block-level secure transformations: random multidimensional projection (RMT) and AES pixel-level encryption (AES). InstaHide is an image mixup and random pixel flipping technique \cite{huang20}. We have analyzed and evaluated them under a multi-level threat model. RMT provides a better security guarantee than InstaHide, under the Level-1 adversarial knowledge with well-preserved model quality. In contrast, AES provides a security guarantee under the Level-2 adversarial knowledge, but it may affect model quality more. The unique features of image disguising also help us to protect models from model-targeted attacks. We have done an extensive experimental evaluation to understand how these methods work in different settings for different datasets.

LGDec 15, 2020
Confidential Machine Learning on Untrusted Platforms: A Survey

Sagar Sharma, Keke Chen

With the ever-growing data and the need for developing powerful machine learning models, data owners increasingly depend on various untrusted platforms (e.g., public clouds, edges, and machine learning service providers) for scalable processing or collaborative learning. Thus, sensitive data and models are in danger of unauthorized access, misuse, and privacy compromises. A relatively new body of research confidentially trains machine learning models on protected data to address these concerns. In this survey, we summarize notable studies in this emerging area of research. With a unified framework, we highlight the critical challenges and innovations in outsourcing machine learning confidentially. We focus on the cryptographic approaches for confidential machine learning (CML), primarily on model training, while also covering other directions such as perturbation-based approaches and CML in the hardware-assisted computing environment. The discussion will take a holistic way to consider a rich context of the related threat models, security assumptions, design principles, and associated trade-offs amongst data utility, cost, and confidentiality.

CRSep 8, 2020
SGX-MR: Regulating Dataflows for Protecting Access Patterns of Data-Intensive SGX Applications

A K M Mubashwir Alam, Sagar Sharma, Keke Chen

Intel SGX has been a popular trusted execution environment (TEE) for protecting the integrity and confidentiality of applications running on untrusted platforms such as cloud. However, the access patterns of SGX-based programs can still be observed by adversaries, which may leak important information for successful attacks. Researchers have been experimenting with Oblivious RAM (ORAM) to address the privacy of access patterns. ORAM is a powerful low-level primitive that provides application-agnostic protection for any I/O operations, however, at a high cost. We find that some application-specific access patterns, such as sequential block I/O, do not provide additional information to adversaries. Others, such as sorting, can be replaced with specific oblivious algorithms that are more efficient than ORAM. The challenge is that developers may need to look into all the details of application-specific access patterns to design suitable solutions, which is time-consuming and error-prone. In this paper, we present the lightweight SGX based MapReduce (SGX-MR) approach that regulates the dataflow of data-intensive SGX applications for easier application-level access-pattern analysis and protection. It uses the MapReduce framework to cover a large class of data-intensive applications, and the entire framework can be implemented with a small memory footprint. With this framework, we have examined the stages of data processing, identified the access patterns that need protection, and designed corresponding efficient protection methods. Our experiments show that SGX-MR based applications are much more efficient than ORAM-based implementations.

LGFeb 5, 2019
Disguised-Nets: Image Disguising for Privacy-preserving Outsourced Deep Learning

Sagar Sharma, Keke Chen

Deep learning model developers often use cloud GPU resources to experiment with large data and models that need expensive setups. However, this practice raises privacy concerns. Adversaries may be interested in: 1) personally identifiable information or objects encoded in the training images, and 2) the models trained with sensitive data to launch model-based attacks. Learning deep neural networks (DNN) from encrypted data is still impractical due to the large training data and the expensive learning process. A few recent studies have tried to provide efficient, practical solutions to protect data privacy in outsourced deep-learning. However, we find out that they are vulnerable under certain attacks. In this paper, we specifically identify two types of unique attacks on outsourced deep-learning: 1) the visual re-identification attack on the training data, and 2) the class membership attack on the learned models, which can break existing privacy-preserving solutions. We develop an image disguising approach to address these attacks and design a suite of methods to evaluate the levels of attack resilience for a privacy-preserving solution for outsourced deep learning. The experimental results show that our image-disguising mechanisms can provide a high level of protection against the two attacks while still generating high-quality DNN models for image classification.

CYApr 11, 2018
Towards Practical Privacy-Preserving Analytics for IoT and Cloud Based Healthcare Systems

Sagar Sharma, Keke Chen, Amit Sheth

Modern healthcare systems now rely on advanced computing methods and technologies, such as Internet of Things (IoT) devices and clouds, to collect and analyze personal health data at an unprecedented scale and depth. Patients, doctors, healthcare providers, and researchers depend on analytical models derived from such data sources to remotely monitor patients, early-diagnose diseases, and find personalized treatments and medications. However, without appropriate privacy protection, conducting data analytics becomes a source of a privacy nightmare. In this article, we present the research challenges in developing practical privacy-preserving analytics in healthcare information systems. The study is based on kHealth - a personalized digital healthcare information system that is being developed and tested for disease monitoring. We analyze the data and analytic requirements for the involved parties, identify the privacy assets, analyze existing privacy substrates, and discuss the potential tradeoff among privacy, efficiency, and model quality.

CRFeb 22, 2018
Confidential Boosting with Random Linear Classifiers for Outsourced User-generated Data

Sagar Sharma, Keke Chen

User-generated data is crucial to predictive modeling in many applications. With a web/mobile/wearable interface, a data owner can continuously record data generated by distributed users and build various predictive models from the data to improve their operations, services, and revenue. Due to the large size and evolving nature of users data, data owners may rely on public cloud service providers (Cloud) for storage and computation scalability. Exposing sensitive user-generated data and advanced analytic models to Cloud raises privacy concerns. We present a confidential learning framework, SecureBoost, for data owners that want to learn predictive models from aggregated user-generated data but offload the storage and computational burden to Cloud without having to worry about protecting the sensitive data. SecureBoost allows users to submit encrypted or randomly masked data to designated Cloud directly. Our framework utilizes random linear classifiers (RLCs) as the base classifiers in the boosting framework to dramatically simplify the design of the proposed confidential boosting protocols, yet still preserve the model quality. A Cryptographic Service Provider (CSP) is used to assist the Cloud's processing, reducing the complexity of the protocol constructions. We present two constructions of SecureBoost: HE+GC and SecSh+GC, using combinations of homomorphic encryption, garbled circuits, and random masking to achieve both security and efficiency. For a boosted model, Cloud learns only the RLCs and the CSP learns only the weights of the RLCs. Finally, the data owner collects the two parts to get the complete model. We conduct extensive experiments to understand the quality of the RLC-based boosting and the cost distribution of the constructions. Our results show that SecureBoost can efficiently learn high-quality boosting models from protected user-generated data.