CRSCSYFeb 7, 2022

Differential Privacy for Symbolic Systems with Application to Markov Chains

arXiv:2202.03325v231 citations
AI Analysis

This work addresses privacy for symbolic systems, which is a domain-specific problem, and is incremental as it extends differential privacy to new data types.

The authors tackled the problem of protecting privacy for non-numerical or symbolic data, such as trajectories represented as words, by developing a novel differential privacy framework with offline and online mechanisms, achieving validated accuracy for strings of English words.

Data-driven systems are gathering increasing amounts of data from users, and sensitive user data requires privacy protections. In some cases, the data gathered is non-numerical or symbolic, and conventional approaches to privacy, e.g., adding noise, do not apply, though such systems still require privacy protections. Accordingly, we present a novel differential privacy framework for protecting trajectories generated by symbolic systems. These trajectories can be represented as words or strings over a finite alphabet. We develop new differential privacy mechanisms that approximate a sensitive word using a random word that is likely to be near it. An offline mechanism is implemented efficiently using a Modified Hamming Distance Automaton to generate whole privatized output words over a finite time horizon. Then, an online mechanism is implemented by taking in a sensitive symbol and generating a randomized output symbol at each timestep. This work is extended to Markov chains to generate differentially private state sequences that a given Markov chain could have produced. Statistical accuracy bounds are developed to quantify the accuracy of these mechanisms, and numerical results validate the accuracy of these techniques for strings of English words.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes