12.2LGMar 25
Uncovering Memorization in Timeseries Imputation models: LBRM Membership Inference and its link to attribute LeakageFaiz Taleb, Ivan Gazeau, Maryline Laurent
Deep learning models for time series imputation are now essential in fields such as healthcare, the Internet of Things (IoT), and finance. However, their deployment raises critical privacy concerns. Beyond the well-known issue of unintended memorization, which has been extensively studied in generative models, we demonstrate that time series models are vulnerable to inference attacks in a black-box setting. In this work, we introduce a two-stage attack framework comprising: (1) a novel membership inference attack based on a reference model that improves detection accuracy, even for models robust to overfitting-based attacks, and (2) the first attribute inference attack that predicts sensitive characteristics of the training data for timeseries imputation model. We evaluate these attacks on attention-based and autoencoder architectures in two scenarios: models that are trained from scratch, and fine-tuned models where the adversary has access to the initial weights. Our experimental results demonstrate that the proposed membership attack retrieves a significant portion of the training data with a tpr@top25% score significantly higher than a naive attack baseline. We show that our membership attack also provides a good insight of whether attribute inference will work (with a precision of 90% instead of 78% in the genral case).
LGMay 6, 2025
A new membership inference attack that spots memorization in generative and predictive models: Loss-Based with Reference Model algorithm (LBRM)Faiz Taleb, Ivan Gazeau, Maryline Laurent
Generative models can unintentionally memorize training data, posing significant privacy risks. This paper addresses the memorization phenomenon in time series imputation models, introducing the Loss-Based with Reference Model (LBRM) algorithm. The LBRM method leverages a reference model to enhance the accuracy of membership inference attacks, distinguishing between training and test data. Our contributions are twofold: first, we propose an innovative method to effectively extract and identify memorized training data, significantly improving detection accuracy. On average, without fine-tuning, the AUROC improved by approximately 40\%. With fine-tuning, the AUROC increased by approximately 60\%. Second, we validate our approach through membership inference attacks on two types of architectures designed for time series imputation, demonstrating the robustness and versatility of the LBRM approach in different contexts. These results highlight the significant enhancement in detection accuracy provided by the LBRM approach, addressing privacy risks in time series imputation models.
CRJun 6, 2017
Types for Location and Data Security in Cloud EnvironmentsIvan Gazeau, Tom Chothia, Dominic Duggan
Cloud service providers are often trusted to be genuine, the damage caused by being discovered to be attacking their own customers outweighs any benefits such attacks could reap. On the other hand, it is expected that some cloud service users may be actively malicious. In such an open system, each location may run code which has been developed independently of other locations (and which may be secret). In this paper, we present a typed language which ensures that the access restrictions put on data on a particular device will be observed by all other devices running typed code. Untyped, compromised devices can still interact with typed devices without being able to violate the policies, except in the case when a policy directly places trust in untyped locations. Importantly, our type system does not need a middleware layer or all users to register with a preexisting PKI, and it allows for devices to dynamically create new identities. The confidentiality property guaranteed by the language is defined for any kind of intruder: we consider labeled bisimilarity i.e. an attacker cannot distinguish two scenarios that differ by the change of a protected value. This shows our main result that, for a device that runs well typed code and only places trust in other well typed devices, programming errors cannot cause a data leakage.