Teng Wang

h-index18

6papers

596citations

Novelty42%

AI Score30

Ranked #138,420 of 194,257 authors (top 71%)#3,544 in CR (top 52%)

6 Papers

15.2CLSep 17, 2024Code

Large Language Models are Good Multi-lingual Learners : When LLMs Meet Cross-lingual Prompts

Teng Wang, Zhenqi He, Wing-Yin Yu et al.

With the advent of Large Language Models (LLMs), generating rule-based data for real-world applications has become more accessible. Due to the inherent ambiguity of natural language and the complexity of rule sets, especially in long contexts, LLMs often struggle to follow all specified rules, frequently omitting at least one. To enhance the reasoning and understanding of LLMs on long and complex contexts, we propose a novel prompting strategy Multi-Lingual Prompt, namely MLPrompt, which automatically translates the error-prone rule that an LLM struggles to follow into another language, thus drawing greater attention to it. Experimental results on public datasets across various tasks have shown MLPrompt can outperform state-of-the-art prompting methods such as Chain of Thought, Tree of Thought, and Self-Consistency. Additionally, we introduce a framework integrating MLPrompt with an auto-checking mechanism for structured data generation, with a specific case study in text-to-MIP instances. Further, we extend the proposed framework for text-to-SQL to demonstrate its generation ability towards structured data synthesis.

23.4CROct 11, 2020

A Comprehensive Survey on Local Differential Privacy Toward Data Statistics and Analysis

Teng Wang, Xuefeng Zhang, Jingyu Feng et al.

Collecting and analyzing massive data generated from smart devices have become increasingly pervasive in crowdsensing, which are the building blocks for data-driven decision-making. However, extensive statistics and analysis of such data will seriously threaten the privacy of participating users. Local differential privacy (LDP) has been proposed as an excellent and prevalent privacy model with distributed architecture, which can provide strong privacy guarantees for each user while collecting and analyzing data. LDP ensures that each user's data is locally perturbed first in the client-side and then sent to the server-side, thereby protecting data from privacy leaks on both the client-side and server-side. This survey presents a comprehensive and systematic overview of LDP with respect to privacy models, research tasks, enabling mechanisms, and various applications. Specifically, we first provide a theoretical summarization of LDP, including the LDP model, the variants of LDP, and the basic framework of LDP algorithms. Then, we investigate and compare the diverse LDP mechanisms for various data statistics and analysis tasks from the perspectives of frequency estimation, mean estimation, and machine learning. What's more, we also summarize practical LDP-based application scenarios. Finally, we outline several future research directions under LDP.

31.9CRApr 19, 2020

Local Differential Privacy based Federated Learning for Internet of Things

Yang Zhao, Jun Zhao, Mengmeng Yang et al.

Internet of Vehicles (IoV) is a promising branch of the Internet of Things. IoV simulates a large variety of crowdsourcing applications such as Waze, Uber, and Amazon Mechanical Turk, etc. Users of these applications report the real-time traffic information to the cloud server which trains a machine learning model based on traffic information reported by users for intelligent traffic management. However, crowdsourcing application owners can easily infer users' location information, which raises severe location privacy concerns of the users. In addition, as the number of vehicles increases, the frequent communication between vehicles and the cloud server incurs unexpected amount of communication cost. To avoid the privacy threat and reduce the communication cost, in this paper, we propose to integrate federated learning and local differential privacy (LDP) to facilitate the crowdsourcing applications to achieve the machine learning model. Specifically, we propose four LDP mechanisms to perturb gradients generated by vehicles. The Three-Outputs mechanism is proposed which introduces three different output possibilities to deliver a high accuracy when the privacy budget is small. The output possibilities of Three-Outputs can be encoded with two bits to reduce the communication cost. Besides, to maximize the performance when the privacy budget is large, an optimal piecewise mechanism (PM-OPT) is proposed. We further propose a suboptimal mechanism (PM-SUB) with a simple formula and comparable utility to PM-OPT. Then, we build a novel hybrid mechanism by combining Three-Outputs and PM-SUB.

17.0CRNov 27, 2019

Reviewing and Improving the Gaussian Mechanism for Differential Privacy

Jun Zhao, Teng Wang, Tao Bai et al.

Differential privacy provides a rigorous framework to quantify data privacy, and has received considerable interest recently. A randomized mechanism satisfying $(ε, δ)$-differential privacy (DP) roughly means that, except with a small probability $δ$, altering a record in a dataset cannot change the probability that an output is seen by more than a multiplicative factor $e^ε $. A well-known solution to $(ε, δ)$-DP is the Gaussian mechanism initiated by Dwork et al. [1] in 2006 with an improvement by Dwork and Roth [2] in 2014, where a Gaussian noise amount $\sqrt{2\ln \frac{2}δ} \times \fracΔε$ of [1] or $\sqrt{2\ln \frac{1.25}δ} \times \fracΔε$ of [2] is added independently to each dimension of the query result, for a query with $\ell_2$-sensitivity $Δ$. Although both classical Gaussian mechanisms [1,2] assume $0 < ε\leq 1$, our review finds that many studies in the literature have used the classical Gaussian mechanisms under values of $ε$ and $δ$ where the added noise amounts of [1,2] do not achieve $(ε,δ)$-DP. We obtain such result by analyzing the optimal noise amount $σ_{DP-OPT}$ for $(ε,δ)$-DP and identifying $ε$ and $δ$ where the noise amounts of classical mechanisms are even less than $σ_{DP-OPT}$. Since $σ_{DP-OPT}$ has no closed-form expression and needs to be approximated in an iterative manner, we propose Gaussian mechanisms by deriving closed-form upper bounds for $σ_{DP-OPT}$. Our mechanisms achieve $(ε,δ)$-DP for any $ε$, while the classical mechanisms [1,2] do not achieve $(ε,δ)$-DP for large $ε$ given $δ$. Moreover, the utilities of our mechanisms improve those of [1,2] and are close to that of the optimal yet more computationally expensive Gaussian mechanism.

13.0CRJul 11, 2019

Conditional Analysis for Key-Value Data with Local Differential Privacy

Lin Sun, Jun Zhao, Xiaojun Ye et al.

Local differential privacy (LDP) has been deemed as the de facto measure for privacy-preserving distributed data collection and analysis. Recently, researchers have extended LDP to the basic data type in NoSQL systems: the key-value data, and show its feasibilities in mean estimation and frequency estimation. In this paper, we develop a set of new perturbation mechanisms for key-value data collection and analysis under the strong model of local differential privacy. Since many modern machine learning tasks rely on the availability of conditional probability or the marginal statistics, we then propose the conditional frequency estimation method for key analysis and the conditional mean estimation for value analysis in key-value data. The released statistics with conditions can further be used in learning tasks. Extensive experiments of frequency and mean estimation on both synthetic and real-world datasets validate the effectiveness and accuracy of the proposed key-value perturbation mechanisms against the state-of-art competitors.

17.0CRJun 5, 2019

Locally Differentially Private Data Collection and Analysis

Teng Wang, Jun Zhao, Xinyu Yang et al.

Local differential privacy (LDP) can provide each user with strong privacy guarantees under untrusted data curators while ensuring accurate statistics derived from privatized data. Due to its powerfulness, LDP has been widely adopted to protect privacy in various tasks (e.g., heavy hitters discovery, probability estimation) and systems (e.g., Google Chrome, Apple iOS). Although $ε$-LDP has been proposed for many years, the more general notion of $(ε, δ)$-LDP has only been studied in very few papers, which mainly consider mean estimation for numeric data. Besides, prior solutions achieve $(ε, δ)$-LDP by leveraging Gaussian mechanism, which leads to low accuracy of the aggregated results. In this paper, we propose novel mechanisms that achieve $(ε, δ)$-LDP with high utility in data analytics and machine learning. Specifically, we first design $(ε, δ)$-LDP algorithms for collecting multi-dimensional numeric data, which can ensure higher accuracy than the optimal Gaussian mechanism while guaranteeing strong privacy for each user. Then, we investigate different local protocols for categorical attributes under $(ε, δ)$-LDP. Furthermore, we conduct theoretical analysis on the error bound and variance of the proposed algorithms. Experimental results on real and synthetic datasets demonstrate the high data utility of our proposed algorithms on both simple data statistics and complex machine learning models.