Koki Hamada

CR
4papers
52citations
Novelty63%
AI Score27

4 Papers

CRDec 24, 2021
Efficient decision tree training with new data structure for secure multi-party computation

Koki Hamada, Dai Ikarashi, Ryo Kikuchi et al.

We propose a secure multi-party computation (MPC) protocol that constructs a secret-shared decision tree for a given secret-shared dataset. The previous MPC-based decision tree training protocol (Abspoel et al. 2021) requires $O(2^hmn\log n)$ comparisons, being exponential in the tree height $h$ and with $n$ and $m$ being the number of rows and that of attributes in the dataset, respectively. The cause of the exponential number of comparisons in $h$ is that the decision tree training algorithm is based on the divide-and-conquer paradigm, where dummy rows are added after each split in order to hide the number of rows in the dataset. We resolve this issue via secure data structure that enables us to compute an aggregate value for every group while hiding the grouping information. By using this data structure, we can train a decision tree without adding dummy rows while hiding the size of the intermediate data. We specifically describes a decision tree training protocol that requires only $O(hmn\log n)$ comparisons when the input attributes are continuous and the output attribute is binary. Note that the order is now \emph{linear} in the tree height $h$. To demonstrate the practicality of our protocol, we implement it in an MPC framework based on a three-party secret sharing scheme. Our implementation results show that our protocol trains a decision tree with a height of 5 in 33 seconds for a dataset of 100,000 rows and 10 attributes.

CRJul 22, 2021
Designing a Location Trace Anonymization Contest

Takao Murakami, Hiromi Arai, Koki Hamada et al.

For a better understanding of anonymization methods for location traces, we have designed and held a location trace anonymization contest that deals with a long trace (400 events per user) and fine-grained locations (1024 regions). In our contest, each team anonymizes her original traces, and then the other teams perform privacy attacks against the anonymized traces. In other words, both defense and attack compete together, which is close to what happens in real life. Prior to our contest, we show that re-identification alone is insufficient as a privacy risk and that trace inference should be added as an additional risk. Specifically, we show an example of anonymization that is perfectly secure against re-identification and is not secure against trace inference. Based on this, our contest evaluates both the re-identification risk and trace inference risk and analyzes their relationship. Through our contest, we show several findings in a situation where both defense and attack compete together. In particular, we show that an anonymization method secure against trace inference is also secure against re-identification under the presence of appropriate pseudonymization. We also report defense and attack algorithms that won first place, and analyze the utility of anonymized traces submitted by teams in various applications such as POI recommendation and geo-data analysis.

CRJun 4, 2021
Adam in Private: Secure and Fast Training of Deep Neural Networks with Adaptive Moment Estimation

Nuttapong Attrapadung, Koki Hamada, Dai Ikarashi et al.

Privacy-preserving machine learning (PPML) aims at enabling machine learning (ML) algorithms to be used on sensitive data. We contribute to this line of research by proposing a framework that allows efficient and secure evaluation of full-fledged state-of-the-art ML algorithms via secure multi-party computation (MPC). This is in contrast to most prior works, which substitute ML algorithms with approximated "MPC-friendly" variants. A drawback of the latter approach is that fine-tuning of the combined ML and MPC algorithms is required, which might lead to less efficient algorithms or inferior quality ML. This is an issue for secure deep neural networks (DNN) training in particular, as this involves arithmetic algorithms thought to be "MPC-unfriendly", namely, integer division, exponentiation, inversion, and square root. In this work, we propose secure and efficient protocols for the above seemingly MPC-unfriendly computations. Our protocols are three-party protocols in the honest-majority setting, and we propose both passively secure and actively secure with abort variants. A notable feature of our protocols is that they simultaneously provide high accuracy and efficiency. This framework enables us to efficiently and securely compute modern ML algorithms such as Adam and the softmax function "as is", without resorting to approximations. As a result, we obtain secure DNN training that outperforms state-of-the-art three-party systems; our full training is up to 6.7 times faster than just the online phase of the recently proposed FALCON@PETS'21 on a standard benchmark network. We further perform measurements on real-world DNNs, AlexNet and VGG16. The performance of our framework is up to a factor of about 12-14 faster for AlexNet and 46-48 faster for VGG16 to achieve an accuracy of 70% and 75%, respectively, when compared to FALCON.

CRNov 11, 2019
Privacy-Preserving Multiple Tensor Factorization for Synthesizing Large-Scale Location Traces with Cluster-Specific Features

Takao Murakami, Koki Hamada, Yusuke Kawamoto et al.

With the widespread use of LBSs (Location-based Services), synthesizing location traces plays an increasingly important role in analyzing spatial big data while protecting user privacy. In particular, a synthetic trace that preserves a feature specific to a cluster of users (e.g., those who commute by train, those who go shopping) is important for various geo-data analysis tasks and for providing a synthetic location dataset. Although location synthesizers have been widely studied, existing synthesizers do not provide sufficient utility, privacy, or scalability, hence are not practical for large-scale location traces. To overcome this issue, we propose a novel location synthesizer called PPMTF (Privacy-Preserving Multiple Tensor Factorization). We model various statistical features of the original traces by a transition-count tensor and a visit-count tensor. We factorize these two tensors simultaneously via multiple tensor factorization, and train factor matrices via posterior sampling. Then we synthesize traces from reconstructed tensors, and perform a plausible deniability test for a synthetic trace. We comprehensively evaluate PPMTF using two datasets. Our experimental results show that PPMTF preserves various statistical features including cluster-specific features, protects user privacy, and synthesizes large-scale location traces in practical time. PPMTF also significantly outperforms the state-of-the-art methods in terms of utility and scalability at the same level of privacy.