LGAug 23, 2023
ULDP-FL: Federated Learning with Across Silo User-Level Differential PrivacyFumiyuki Kato, Li Xiong, Shun Takagi et al.
Differentially Private Federated Learning (DP-FL) has garnered attention as a collaborative machine learning approach that ensures formal privacy. Most DP-FL approaches ensure DP at the record-level within each silo for cross-silo FL. However, a single user's data may extend across multiple silos, and the desired user-level DP guarantee for such a setting remains unknown. In this study, we present Uldp-FL, a novel FL framework designed to guarantee user-level DP in cross-silo FL where a single user's data may belong to multiple silos. Our proposed algorithm directly ensures user-level DP through per-user weighted clipping, departing from group-privacy approaches. We provide a theoretical analysis of the algorithm's privacy and utility. Additionally, we enhance the utility of the proposed algorithm with an enhanced weighting strategy based on user record distribution and design a novel private protocol that ensures no additional information is revealed to the silos and the server. Experiments on real-world datasets show substantial improvements in our methods in privacy-utility trade-offs under user-level DP compared to baseline methods. To the best of our knowledge, our work is the first FL framework that effectively provides user-level DP in the general cross-silo FL setting.
CRApr 8, 2022
Network Shuffling: Privacy Amplification via Random WalksSeng Pei Liew, Tsubasa Takahashi, Shun Takagi et al.
Recently, it is shown that shuffling can amplify the central differential privacy guarantees of data randomized with local differential privacy. Within this setup, a centralized, trusted shuffler is responsible for shuffling by keeping the identities of data anonymous, which subsequently leads to stronger privacy guarantees for systems. However, introducing a centralized entity to the originally local privacy model loses some appeals of not having any centralized entity as in local differential privacy. Moreover, implementing a shuffler in a reliable way is not trivial due to known security issues and/or requirements of advanced hardware or secure computation technology. Motivated by these practical considerations, we rethink the shuffle model to relax the assumption of requiring a centralized, trusted shuffler. We introduce network shuffling, a decentralized mechanism where users exchange data in a random-walk fashion on a network/graph, as an alternative of achieving privacy amplification via anonymity. We analyze the threat model under such a setting, and propose distributed protocols of network shuffling that is straightforward to implement in practice. Furthermore, we show that the privacy amplification rate is similar to other privacy amplification techniques such as uniform shuffling. To our best knowledge, among the recently studied intermediate trust models that leverage privacy amplification techniques, our work is the first that is not relying on any centralized entity to achieve privacy amplification.
CRMay 13, 2024
HRNet: Differentially Private Hierarchical and Multi-Resolution Network for Human Mobility Data SynthesizationShun Takagi, Li Xiong, Fumiyuki Kato et al.
Human mobility data offers valuable insights for many applications such as urban planning and pandemic response, but its use also raises privacy concerns. In this paper, we introduce the Hierarchical and Multi-Resolution Network (HRNet), a novel deep generative model specifically designed to synthesize realistic human mobility data while guaranteeing differential privacy. We first identify the key difficulties inherent in learning human mobility data under differential privacy. In response to these challenges, HRNet integrates three components: a hierarchical location encoding mechanism, multi-task learning across multiple resolutions, and private pre-training. These elements collectively enhance the model's ability under the constraints of differential privacy. Through extensive comparative experiments utilizing a real-world dataset, HRNet demonstrates a marked improvement over existing methods in balancing the utility-privacy trade-off.
LGFeb 15, 2022
OLIVE: Oblivious Federated Learning on Trusted Execution Environment against the risk of sparsificationFumiyuki Kato, Yang Cao, Masatoshi Yoshikawa
Combining Federated Learning (FL) with a Trusted Execution Environment (TEE) is a promising approach for realizing privacy-preserving FL, which has garnered significant academic attention in recent years. Implementing the TEE on the server side enables each round of FL to proceed without exposing the client's gradient information to untrusted servers. This addresses usability gaps in existing secure aggregation schemes as well as utility gaps in differentially private FL. However, to address the issue using a TEE, the vulnerabilities of server-side TEEs need to be considered -- this has not been sufficiently investigated in the context of FL. The main technical contribution of this study is the analysis of the vulnerabilities of TEE in FL and the defense. First, we theoretically analyze the leakage of memory access patterns, revealing the risk of sparsified gradients, which are commonly used in FL to enhance communication efficiency and model accuracy. Second, we devise an inference attack to link memory access patterns to sensitive information in the training dataset. Finally, we propose an oblivious yet efficient aggregation algorithm to prevent memory access pattern leakage. Our experiments on real-world data demonstrate that the proposed method functions efficiently in practical scales.
CRApr 14, 2021
Preventing Manipulation Attack in Local Differential Privacy using Verifiable Randomization MechanismFumiyuki Kato, Yang Cao, Masatoshi Yoshikawa
Several randomization mechanisms for local differential privacy (LDP) (e.g., randomized response) are well-studied to improve the utility. However, recent studies show that LDP is generally vulnerable to malicious data providers in nature. Because a data collector has to estimate background data distribution only from already randomized data, malicious data providers can manipulate their output before sending, i.e., randomization would provide them plausible deniability. Attackers can skew the estimations effectively since they are calculated by normalizing with randomization probability defined in the LDP protocol, and can even control the estimations. In this paper, we show how we prevent malicious attackers from compromising LDP protocol. Our approach is to utilize a verifiable randomization mechanism. The data collector can verify the completeness of executing an agreed randomization mechanism for every data provider. Our proposed method completely protects the LDP protocol from output-manipulations, and significantly mitigates the expected damage from attacks. We do not assume any specific attacks, and it works effectively against general output-manipulation, and thus is more powerful than previously proposed countermeasures. We describe the secure version of three state-of-the-art LDP protocols and empirically show they cause acceptable overheads according to several parameters.
CRDec 7, 2020
PCT-TEE: Trajectory-based Private Contact Tracing System with Trusted Execution EnvironmentFumiyuki Kato, Yang Cao, Masatoshi Yoshikawa
Existing Bluetooth-based Private Contact Tracing (PCT) systems can privately detect whether people have come into direct contact with COVID-19 patients. However, we find that the existing systems lack functionality and flexibility, which may hurt the success of the contact tracing. Specifically, they cannot detect indirect contact (e.g., people may be exposed to coronavirus because of used the same elevator even without direct contact); they also cannot flexibly change the rules of "risky contact", such as how many hours of exposure or how close to a COVID-19 patient that is considered as risk exposure, which may be changed with the environmental situation. In this paper, we propose an efficient and secure contact tracing system that enables both direct contact and indirect contact. To address the above problems, we need to utilize users' trajectory data for private contact tracing, which we call trajectory-based PCT. We formalize this problem as Spatiotemporal Private Set Intersection. By analyzing different approaches such as homomorphic encryption that could be extended to solve this problem, we identify that Trusted Execution Environment (TEE) is a proposing method to achieve our requirements. The major challenge is how to design algorithms for spatiotemporal private set intersection under limited secure memory of TEE. To this end, we design a TEE-based system with flexible trajectory data encoding algorithms. Our experiments on real-world data show that the proposed system can process thousands of queries on tens of million records of trajectory data in a few seconds.
CROct 26, 2020
Secure and Efficient Trajectory-Based Contact Tracing using Trusted HardwareFumiyuki Kato, Yang Cao, Masatoshi Yoshikawa
The COVID-19 pandemic has prompted technological measures to control the spread of the disease. Private contact tracing (PCT) is one of the promising techniques for the purpose. However, the recently proposed Bluetooth-based PCT has several limitations in terms of functionality and flexibility. The existing systems are only able to detect direct contact (i.e., human-human contact), but cannot detect indirect contact (i.e., human-object, such as the disease transmission through surface). Moreover, the rule of risky contact cannot be flexibly changed with the environmental situation and the nature of the virus. In this paper, we propose a secure and efficient trajectory-based PCT system using trusted hardware. We formalize trajectory-based PCT as a generalization of the well-studied Private Set Intersection (PSI), which is mostly based on cryptographic primitives and thus insufficient. We solve the problem by leveraging trusted hardware such as Intel SGX and designing a novel algorithm to achieve a secure, efficient and flexible PCT system. Our experiments on real-world data show that the proposed system can achieve high performance and scalability. Specifically, our system (one single machine with Intel SGX) can process thousands of queries on 100 million records of trajectory data in a few seconds.