Lamassu: Storage-Efficient Host-Side Encryption
This addresses a key issue for storage customers, especially in cloud environments, by allowing encryption without sacrificing deduplication, though it is incremental as it builds on convergent encryption with a novel metadata handling approach.
The paper tackles the problem of enabling storage-based data deduplication while using host-side encryption for security, presenting Lamassu, which achieves this through block-oriented convergent encryption with metadata embedded in the data stream, resulting in excellent storage efficiency and I/O throughput comparable to conventional encryption systems.
Many storage customers are adopting encryption solutions to protect critical data. Most existing encryption solutions sit in, or near, the application that is the source of critical data, upstream of the primary storage system. Placing encryption near the source ensures that data remains encrypted throughout the storage stack, making it easier to use untrusted storage, such as public clouds. Unfortunately, such a strategy also prevents downstream storage systems from applying content-based features, such as deduplication, to the data. In this paper, we present Lamassu, an encryption solution that uses block-oriented, host-based, convergent encryption to secure data, while preserving storage-based data deduplication. Unlike past convergent encryption systems, which typically store encryption metadata in a dedicated store, our system transparently inserts its metadata into each file's data stream. This allows us to add Lamassu to an application stack without modifying either the client application or the storage controller. In this paper, we lay out the architecture and security model used in our system, and present a new model for maintaining metadata consistency and data integrity in a convergent encryption environment. We also evaluate its storage efficiency and I/O performance by using a variety of microbenchmarks, showing that Lamassu provides excellent storage efficiency, while achieving I/O throughput on par with similar conventional encryption systems.