Qiyao Luo

CR
3papers
24citations
Novelty77%
AI Score46

3 Papers

57.5CRMay 1
Defense against Poisoning Attacks under Shuffle-DP

Siyi Wang, Qiyao Luo, Yihua Hu et al.

Differential Privacy (DP) has become the gold standard for protecting individual privacy in data analytics, and the shuffle-DP model has attracted significant attention from both academia and industry due to its favorable balance between privacy and utility. However, existing shuffle-DP protocols rely on a strong assumption: all users behave honestly. In real-world scenarios, adversarial users can exploit this vulnerability through poisoning attacks, compromising both privacy guarantees and the utility of analytical results. While defending against poisoning attacks in the shuffle-DP model has recently gained interest, existing solutions are limited to frequency estimation tasks. To address this issue, we propose the first general defense framework for all union-preserving queries, capable of transforming any shuffle-DP protocol into a version resilient to poisoning attacks. Beyond robust defense against poisoning attacks, our framework achieves high utility of analytical results. Compared to the original shuffle-DP protocol, it retains asymptotically equivalent error in attack-free settings and incurs only a polylogarithmic increase in error when a constant number of attackers are present. We demonstrate the generality of our framework on several common queries, including summation, frequency estimation, and range counting. Experimental results confirm that our approach effectively defends against poisoning attacks while maintaining strong utility and communication efficiency.

CRNov 12, 2021
Frequency Estimation in the Shuffle Model with Almost a Single Message

Qiyao Luo, Yilei Wang, Ke Yi

We present a protocol in the shuffle model of differential privacy (DP) for the \textit{frequency estimation} problem that achieves error $ω(1)\cdot O(\log n)$, almost matching the central-DP accuracy, with $1+o(1)$ messages per user. This exhibits a sharp transition phenomenon, as there is a lower bound of $Ω(n^{1/4})$ if each user is allowed to send only one message. Previously, such a result is only known when the domain size $B$ is $o(n)$. For a large domain, we also need an efficient method to identify the \textit{heavy hitters} (i.e., elements that are frequent enough). For this purpose, we design a shuffle-DP protocol that uses $o(1)$ messages per user and can identify all heavy hitters in time polylogarithmic in $B$. Finally, by combining our frequency estimation and the heavy hitter detection protocols, we show how to solve the $B$-dimensional \textit{1-sparse vector summation} problem in the high-dimensional setting $B=Ω(n)$, achieving the optimal central-DP MSE $\tilde O(n)$ with $1+o(1)$ messages per user. In addition to error and message number, our protocols improve in terms of message size and running time as well. They are also very easy to implement. The experimental results demonstrate order-of-magnitude improvement over prior work.

CRSep 30, 2021
Secure Machine Learning over Relational Data

Qiyao Luo, Yilei Wang, Zhenghang Ren et al.

A closer integration of machine learning and relational databases has gained steam in recent years due to the fact that the training data to many ML tasks is the results of a relational query (most often, a join-select query). In a federated setting, this poses an additional challenge, that the tables are held by different parties as their private data, and the parties would like to train the model without having to use a trusted third party. Existing work has only considered the case where the training data is stored in a flat table that has been vertically partitioned, which corresponds to a simple PK-PK join. In this paper, we describe secure protocols to compute the join results of multiple tables conforming to a general foreign-key acyclic schema, and how to feed the results in secret-shared form to a secure ML toolbox. Furthermore, existing secure ML systems reveal the PKs in the join results. We strengthen the privacy protection to higher levels and achieve zero information leakage beyond the trained model. If the model itself is considered sensitive, we show how differential privacy can be incorporated into our framework to also prevent the model from breaching individuals' privacy.