Computing on Masked Data to improve the Security of Big Data
This addresses security concerns for organizations, such as homeland security missions, that rely on cloud-based big data analytics, though it appears incremental by combining existing database and cryptographic techniques.
The paper tackles the security challenge of processing big data in untrusted cloud environments by proposing the Computing on Masked Data (CMD) tool, which securely offloads mathematical operations like correlation or thresholding to the cloud with low overhead.
Organizations that make use of large quantities of information require the ability to store and process data from central locations so that the product can be shared or distributed across a heterogeneous group of users. However, recent events underscore the need for improving the security of data stored in such untrusted servers or databases. Advances in cryptographic techniques and database technologies provide the necessary security functionality but rely on a computational model in which the cloud is used solely for storage and retrieval. Much of big data computation and analytics make use of signal processing fundamentals for computation. As the trend of moving data storage and computation to the cloud increases, homeland security missions should understand the impact of security on key signal processing kernels such as correlation or thresholding. In this article, we propose a tool called Computing on Masked Data (CMD), which combines advances in database technologies and cryptographic tools to provide a low overhead mechanism to offload certain mathematical operations securely to the cloud. This article describes the design and development of the CMD tool.