IVFeb 7, 2024
RAGE for the Machine: Image Compression with Low-Cost Random Access for Embedded ApplicationsChristian D. Rask, Daniel E. Lucani
We introduce RAGE, an image compression framework that achieves four generally conflicting objectives: 1) good compression for a wide variety of color images, 2) computationally efficient, fast decompression, 3) fast random access of images with pixel-level granularity without the need to decompress the entire image, 4) support for both lossless and lossy compression. To achieve these, we rely on the recent concept of generalized deduplication (GD), which is known to provide efficient lossless (de)compression and fast random access in time-series data, and deliver key expansions suitable for image compression, both lossless and lossy. Using nine different datasets, incl. graphics, logos, natural images, we show that RAGE has similar or better compression ratios to state-of-the-art lossless image compressors, while delivering pixel-level random access capabilities. Tests in an ARM Cortex-M33 platform show seek times between 9.9 and 40.6~ns and average decoding time per pixel between 274 and 1226~ns. Our measurements also show that RAGE's lossy variant, RAGE-Q, outperforms JPEG by several fold in terms of distortion in embedded graphics and has reasonable compression and distortion for natural images.
LGJun 27, 2025
dreaMLearning: Data Compression Assisted Machine LearningXiaobo Zhao, Aaron Hurst, Panagiotis Karras et al.
Despite rapid advancements, machine learning, particularly deep learning, is hindered by the need for large amounts of labeled data to learn meaningful patterns without overfitting and immense demands for computation and storage, which motivate research into architectures that can achieve good performance with fewer resources. This paper introduces dreaMLearning, a novel framework that enables learning from compressed data without decompression, built upon Entropy-based Generalized Deduplication (EntroGeDe), an entropy-driven lossless compression method that consolidates information into a compact set of representative samples. DreaMLearning accommodates a wide range of data types, tasks, and model architectures. Extensive experiments on regression and classification tasks with tabular and image data demonstrate that dreaMLearning accelerates training by up to 8.8x, reduces memory usage by 10x, and cuts storage by 42%, with a minimal impact on model performance. These advancements enhance diverse ML applications, including distributed and federated learning, and tinyML on resource-constrained edge devices, unlocking new possibilities for efficient and scalable learning.
CRFeb 28, 2022
Bonsai: A Generalized Look at Dual DeduplicationHadi Sehat, Anders Lindskov Kloborg, Christian Mørup et al.
Cloud Service Providers (CSPs) offer a vast amount of storage space at competitive prices to cope with the growing demand for digital data storage. Dual deduplication is a recent framework designed to improve data compression on the CSP while keeping clients' data private from the CSP. To achieve this, clients perform lightweight information-theoretic transformations to their data prior to upload. We investigate the effectiveness of dual deduplication, and propose an improvement for the existing state-of-the-art method. We name our proposal Bonsai as it aims at reducing storage fingerprint and improving scalability. In detail, Bonsai achieves (1) significant reduction in client storage, (2) reduction in total required storage (client + CSP), and (3) reducing the deduplication time on the CSP. Our experiments show that Bonsai achieves compression rates of 68\% on the cloud and 5\% on the client, while allowing the cloud to identify deduplications in a time-efficient manner. We also show that combining our method with universal compressors in the cloud, e.g., Brotli, can yield better overall compression on the data compared to only applying the universal compressor or plain Bonsai. Finally, we show that Bonsai and its variants provide sufficient privacy against an honest-but-curious CPS that knows the distribution of the Clients' original data.
CRJan 26, 2022
Bifrost: Secure, Scalable and Efficient File Sharing System Using Dual DeduplicationHadi Sehat, Elena Pagnin, Daniel E. Lucani
We consider the problem of sharing sensitive or valuable files across users while partially relying on a common, untrusted third-party, e.g., a Cloud Storage Provider (CSP). Although users can rely on a secure peer-to-peer (P2P) channel for file sharing, this introduces potential delay on the data transfer and requires the sender to remain active and connected while the transfer process occurs. Instead of using the P2P channel for the entire file, users can upload information about the file on a common CSP and share only the essential information that enables the receiver to download and recover the original file. This paper introduces Bifrost, an innovative file sharing system inspired by recent results on dual deduplication. Bifrost achieves the desired functionality and simultaneously guarantees that (1) the CSP can efficiently compress outsourced data; (2) the secure P2P channel is used only to transmit short, but crucial information; (3) users can check for data integrity, i.e., detect if the CSP alters the outsourced data; and (4) only the sender (data owner) and the intended receiver can access the file after sharing, i.e., the cloud or no malicious adversary can infer useful information about the shared file. We analyze compression and bandwidth performance using a proof-of-concept implementation. Our experiments show that secure file sharing can be achieved by sending only 650 bits on the P2P channel, irrespective of file size, while the CSP that aids the sharing can enjoy a compression rate of 86.9 %.
ITJul 19, 2021
Energy Efficient Data Recovery from Corrupted LoRa FramesNiloofar Yazdani, Nikolaos Kouvelas, R Venkatesha Prasad et al.
High frame-corruption is widely observed in Long Range Wide Area Networks (LoRaWAN) due to the coexistence with other networks in ISM bands and an Aloha-like MAC layer. LoRa's Forward Error Correction (FEC) mechanism is often insufficient to retrieve corrupted data. In fact, real-life measurements show that at least one-fourth of received transmissions are corrupted. When more frames are dropped, LoRa nodes usually switch over to higher spreading factors (SF), thus increasing transmission times and increasing the required energy. This paper introduces ReDCoS, a novel coding technique at the application layer that improves recovery of corrupted LoRa frames, thus reducing the overall transmission time and energy invested by LoRa nodes by several-fold. ReDCoS utilizes lightweight coding techniques to pre-encode the transmitted data. Therefore, the inbuilt Cyclic Redundancy Check (CRC) that follows is computed based on an already encoded data. At the receiver, we use both the CRC and the coded data to recover data from a corrupted frame beyond the built-in Error Correcting Code (ECC). We compare the performance of ReDCoS to (I) the standard FEC of vanilla-LoRaWAN, and to (ii) RS coding applied as ECC to the data of LoRaWAN. The results indicated a 54x and 13.5x improvement of decoding ratio, respectively, when 20 data symbols were sent. Furthermore, we evaluated ReDCoS on-field using LoRa SX1261 transceivers showing that it outperformed RS-coding by factor of at least 2x (and up to 6x) in terms of the decoding ratio while consuming 38.5% less energy per correctly received transmission.
CRJul 22, 2020
Yggdrasil: Privacy-aware Dual Deduplication in Multi Client SettingsHadi Sehat, Elena Pagnin, Daniel E. Lucani
This paper proposes Yggdrasil, a protocol for privacy-aware dual data deduplication in multi client settings. Yggdrasil is designed to reduce the cloud storage space while safeguarding the privacy of the client's outsourced data. Yggdrasil combines three innovative tools to achieve this goal. First, generalized deduplication, an emerging technique to reduce data footprint. Second, non-deterministic transformations that are described compactly and improve the degree of data compression in the Cloud (across users). Third, data preprocessing in the clients in the form of lightweight, privacy-driven transformations prior to upload. This guarantees that an honest-but-curious Cloud service trying to retrieve the client's actual data will face a high degree of uncertainty as to what the original data is. We provide a mathematical analysis of the measure of uncertainty as well as the compression potential of our protocol. Our experiments with a HDFS log data set shows that 49% overall compression can be achieved, with clients storing only 12% for privacy and the Cloud storing the rest. This is achieved while ensuring that each fragment uploaded to the Cloud would have 10^296 possible original strings from the client. Higher uncertainty is possible, with some reduction of compression potential.
ITNov 18, 2015
Analysis and Optimization of Sparse Random Linear Network Coding for Reliable Multicast ServicesAndrea Tassi, Ioannis Chatzigeorgiou, Daniel E. Lucani
Point-to-multipoint communications are expected to play a pivotal role in next-generation networks. This paper refers to a cellular system transmitting layered multicast services to a multicast group of users. Reliability of communications is ensured via different Random Linear Network Coding (RLNC) techniques. We deal with a fundamental problem: the computational complexity of the RLNC decoder. The higher the number of decoding operations is, the more the user's computational overhead grows and, consequently, the faster the battery of mobile devices drains. By referring to several sparse RLNC techniques, and without any assumption on the implementation of the RLNC decoder in use, we provide an efficient way to characterize the performance of users targeted by ultra-reliable layered multicast services. The proposed modeling allows to efficiently derive the average number of coded packet transmissions needed to recover one or more service layers. We design a convex resource allocation framework that allows to minimize the complexity of the RLNC decoder by jointly optimizing the transmission parameters and the sparsity of the code. The designed optimization framework also ensures service guarantees to predetermined fractions of users. The performance of the proposed optimization framework is then investigated in a LTE-A eMBMS network multicasting H.264/SVC video services.