Farhad Farokhi

CR
h-index2
57papers
776citations
Novelty49%
AI Score47

57 Papers

OCMay 5, 2014
Distributed MPC Via Dual Decomposition and Alternating Direction Method of Multipliers

Farhad Farokhi, Iman Shames, Karl H. Johansson

A conventional way to handle model predictive control (MPC) problems distributedly is to solve them via dual decomposition and gradient ascent. However, at each time-step, it might not be feasible to wait for the dual algorithm to converge. As a result, the algorithm might be needed to be terminated prematurely. One is then interested to see if the solution at the point of termination is close to the optimal solution and when one should terminate the algorithm if a certain distance to optimality is to be guaranteed. In this chapter, we look at this problem for distributed systems under general dynamical and performance couplings, then, we make a statement on validity of similar results where the problem is solved using alternating direction method of multipliers.

OCApr 27, 2012
Optimal Structured Static State-Feedback Control Design with Limited Model Information for Fully-Actuated Systems

Farhad Farokhi, Cedric Langbort, Karl H. Johansson

We introduce the family of limited model information control design methods, which construct controllers by accessing the plant's model in a constrained way, according to a given design graph. We investigate the closed-loop performance achievable by such control design methods for fully-actuated discrete-time linear time-invariant systems, under a separable quadratic cost. We restrict our study to control design methods which produce structured static state feedback controllers, where each subcontroller can at least access the state measurements of those subsystems that affect its corresponding subsystem. We compute the optimal control design strategy (in terms of the competitive ratio and domination metrics) when the control designer has access to the local model information and the global interconnection structure of the plant-to-be-controlled. Lastly, we study the trade-off between the amount of model information exploited by a control design method and the best closed-loop performance (in terms of the competitive ratio) of controllers it can produce.

OCOct 15, 2013
Stochastic Sensor Scheduling for Networked Control Systems

Farhad Farokhi, Karl H. Johansson

Optimal sensor scheduling with applications to networked estimation and control systems is considered. We model sensor measurement and transmission instances using jumps between states of a continuous-time Markov chain. We introduce a cost function for this Markov chain as the summation of terms depending on the average sampling frequencies of the subsystems and the effort needed for changing the parameters of the underlying Markov chain. By minimizing this cost function through extending Brockett's recent approach to optimal control of Markov chains, we extract an optimal scheduling policy to fairly allocate the network resources among the control loops. We study the statistical properties of this scheduling policy in order to compute upper bounds for the closed-loop performance of the networked system, where several decoupled scalar subsystems are connected to their corresponding estimator or controller through a shared communication medium. We generalize the estimation results to observable subsystems of arbitrary order. Finally, we illustrate the developed results numerically on a networked system composed of several decoupled water tanks.

OCDec 4, 2013
Optimal Control Design under Limited Model Information for Discrete-Time Linear Systems with Stochastically-Varying Parameters

Farhad Farokhi, Karl H. Johansson

The value of plant model information available in the control design process is discussed. We design optimal state-feedback controllers for interconnected discrete-time linear systems with stochastically-varying parameters. The parameters are assumed to be independently and identically distributed random variables in time. The design of each controller relies only on (i) exact local plant model information and (ii) statistical beliefs about the model of the rest of the system. We consider both finite-horizon and infinite-horizon quadratic cost functions. The optimal state-feedback controller is derived in both cases. The optimal controller is shown to be linear in the state and to depend on the model parameters and their statistics in a particular way. Furthermore, we study the value of model information in optimal control design using the performance degradation ratio which is defined as the supremum (over all possible initial conditions) of the ratio of the cost of the optimal controller with limited model information scaled by the cost of the optimal controller with full model information. An upper bound for the performance degradation ratio is presented for the case of fully-actuated subsystems. Comparisons are made between designs based on limited, statistical, and full model information. Throughout the paper, we use a power network example to illustrate concepts and results.

OCAug 28, 2018
Ensuring Privacy with Constrained Additive Noise by Minimizing Fisher Information

Farhad Farokhi, Henrik Sandberg

The problem of preserving the privacy of individual entries of a database when responding to linear or nonlinear queries with constrained additive noise is considered. For privacy protection, the response to the query is systematically corrupted with an additive random noise whose support is a subset or equal to a pre-defined constraint set. A measure of privacy using the inverse of the trace of the Fisher information matrix is developed. The Cramer-Rao bound relates the variance of any estimator of the database entries to the introduced privacy measure. The probability density that minimizes the trace of the Fisher information (as a proxy for maximizing the measure of privacy) is computed. An extension to dynamic problems is also presented. Finally, the results are compared to the differential privacy methodology.

OCJul 22, 2014
Adaptive Control Design under Structured Model Information Limitation: A Cost-Biased Maximum-Likelihood Approach

Farhad Farokhi, Karl H. Johansson

Networked control strategies based on limited information about the plant model usually results in worse closed-loop performance than optimal centralized control with full plant model information. Recently, this fact has been established by utilizing the concept of competitive ratio, which is defined as the worst case ratio of the cost of a control design with limited model information to the cost of the optimal control design with full model information. We show that an adaptive controller, inspired by a controller proposed by Campi and Kumar, with limited plant model information, asymptotically achieves the closed-loop performance of the optimal centralized controller with full model information for almost any plant. Therefore, there exists, at least, one adaptive control design strategy with limited plant model information that can achieve a competitive ratio equal to one. The plant model considered in the paper belongs to a compact set of stochastic linear time-invariant systems and the closed loop performance measure is the ergodic mean of a quadratic function of the state and control input. We illustrate the applicability of the results numerically on a vehicle platooning problem.

SYJul 16, 2019
Information-Theoretic Privacy through Chaos Synchronization and Optimal Additive Noise

Carlos Murguia, Iman Shames, Farhad Farokhi et al.

We study the problem of maximizing privacy of data sets by adding random vectors generated via synchronized chaotic oscillators. In particular, we consider the setup where information about data sets, queries, is sent through public (unsecured) communication channels to a remote station. To hide private features (specific entries) within the data set, we corrupt the response to queries by adding random vectors. We send the distorted query (the sum of the requested query and the random vector) through the public channel. The distribution of the additive random vector is designed to minimize the mutual information (our privacy metric) between private entries of the data set and the distorted query. We cast the synthesis of this distribution as a convex program in the probabilities of the additive random vector. Once we have the optimal distribution, we propose an algorithm to generate pseudo-random realizations from this distribution using trajectories of a chaotic oscillator. At the other end of the channel, we have a second chaotic oscillator, which we use to generate realizations from the same distribution. Note that if we obtain the same realizations on both sides of the channel, we can simply subtract the realization from the distorted query to recover the requested query. To generate equal realizations, we need the two chaotic oscillators to be synchronized, i.e., we need them to generate exactly the same trajectories on both sides of the channel synchronously in time. We force the two chaotic oscillators into exponential synchronization using a driving signal. Simulations are presented to illustrate our results.

SYSep 23, 2012
Complexity Reduction for Parameter-Dependent Linear Systems

Farhad Farokhi, Henrik Sandberg, Karl H. Johansson

We present a complexity reduction algorithm for a family of parameter-dependent linear systems when the system parameters belong to a compact semi-algebraic set. This algorithm potentially describes the underlying dynamical system with fewer parameters or state variables. To do so, it minimizes the distance (i.e., H-infinity-norm of the difference) between the original system and its reduced version. We present a sub-optimal solution to this problem using sum-of-squares optimization methods. We present the results for both continuous-time and discrete-time systems. Lastly, we illustrate the applicability of our proposed algorithm on numerical examples.

OCOct 12, 2017
Scalable computation for optimal control of cascade systems with constraints

Michael Cantoni, Farhad Farokhi, Eric C. Kerrigan et al.

A method is devised for numerically solving a class of finite-horizon optimal control problems subject to cascade linear discrete-time dynamics. It is assumed that the linear state and input inequality constraints, and the quadratic measure of performance, are all separable with respect to the spatial dimension of the underlying cascade of sub-systems, as well as the temporal dimension of the dynamics. By virtue of this structure, the computation cost of an interior-point method for an equivalent quadratic programming formulation of the optimal control problem can be made to scale linearly with the number of sub-systems. However, the complexity of this approach grows cubically with the time horizon. As such, computational advantage becomes apparent in situations where the number of sub-systems is relatively large. In any case, the method is amenable to distributed computation with low communication overhead and only immediate upstream neighbour sharing of partial model data among processing agents. An example is presented to illustrate an application of the main results to model data for the cascade dynamics of an automated irrigation channel.

OCFeb 14, 2016
Budget-Constrained Contract Design for Effort-Averse Sensors in Averaging Based Estimation

Farhad Farokhi, Iman Shames, Michael Cantoni

Consider a group of effort-averse, or lazy, sensors that seek to minimize the effort invested to collect measurements of a variable. Increasing the effort invested by the sensors improves the quality of the measurements provided to the central planner but this incurs increased costs to the sensors. The central planner, which processes the sensor measurements, employs an averaging estimator. It also determines contracts for rewarding sensors based on the measurements obtained. The problem of designing a contract that yields an estimation-error based quality-of-service level in return for the reward extended to sensors is investigated in this paper. To this end, a game is formulated between the central planner and the sensors. Conditions for the existence and uniqueness of an equilibrium are identified. The equilibrium is constructed explicitly and its properties in response to a reward based contract are studied. It turns out that the central planner, while not being able to directly measure the effort invested by the sensors, can enhance the estimation quality by rewarding each sensor based on the distance of its measurements from the output of the averaging estimator. Ultimately, optimal contracts are designed from the perspective of the budget required for achieving a specified level of estimation error.

GTMar 10, 2015
Promoting Truthful Behaviour in Participatory-Sensing Mechanisms

Farhad Farokhi, Iman Shames, Michael Cantoni

In this paper, the interplay between a class of nonlinear estimators and strategic sensors is studied in several participatory-sensing scenarios. It is shown that for the class of estimators, if the strategic sensors have access to noiseless measurements of the to-be-estimated-variable, truth-telling is an equilibrium of the game that models the interplay between the sensors and the estimator. Furthermore, performance of the proposed estimators is examined in the case that the strategic sensors form coalitions and in the presence of noise.

OCSep 18, 2023
Distributionally Time-Varying Online Stochastic Optimization under Polyak-Łojasiewicz Condition with Application in Conditional Value-at-Risk Statistical Learning

Yuen-Man Pun, Farhad Farokhi, Iman Shames

In this work, we consider a sequence of stochastic optimization problems following a time-varying distribution via the lens of online optimization. Assuming that the loss function satisfies the Polyak-Łojasiewicz condition, we apply online stochastic gradient descent and establish its dynamic regret bound that is composed of cumulative distribution drifts and cumulative gradient biases caused by stochasticity. The distribution metric we adopt here is Wasserstein distance, which is well-defined without the absolute continuity assumption or with a time-varying support set. We also establish a regret bound of online stochastic proximal gradient descent when the objective function is regularized. Moreover, we show that the above framework can be applied to the Conditional Value-at-Risk (CVaR) learning problem. Particularly, we improve an existing proof on the discovery of the PL condition of the CVaR problem, resulting in a regret bound of online stochastic gradient descent.

SPFeb 2, 2019
State Estimation over Worst-Case Erasure and Symmetric Channels with Memory

Amir Saberi, Farhad Farokhi, Girish N. Nair

Worst-case models of erasure and symmetric channels are investigated, in which the number of channel errors occurring in each sliding window of a given length is bounded. Upper and lower bounds on their zero-error capacities are derived, with the lower bounds revealing a connection with the topological entropy of the channel dynamics. Necessary and sufficient conditions for linear state estimation with bounded estimation errors via such channels are then obtained, by extending previous results for non-stochastic memoryless channels to those with finite memory. These estimation conditions involve the topological entropies of the linear system and the channel.

LGSep 20, 2023
Information Leakage from Data Updates in Machine Learning Models

Tian Hui, Farhad Farokhi, Olga Ohrimenko

In this paper we consider the setting where machine learning models are retrained on updated datasets in order to incorporate the most up-to-date information or reflect distribution shifts. We investigate whether one can infer information about these updates in the training data (e.g., changes to attribute values of records). Here, the adversary has access to snapshots of the machine learning model before and after the change in the dataset occurs. Contrary to the existing literature, we assume that an attribute of a single or multiple training data points are changed rather than entire data records are removed or added. We propose attacks based on the difference in the prediction confidence of the original model and the updated model. We evaluate our attack methods on two public datasets along with multi-layer perceptron and logistic regression models. We validate that two snapshots of the model can result in higher information leakage in comparison to having access to only the updated model. Moreover, we observe that data records with rare values are more vulnerable to attacks, which points to the disparate vulnerability of privacy attacks in the update setting. When multiple records with the same original attribute value are updated to the same new value (i.e., repeated changes), the attacker is more likely to correctly guess the updated values since repeated changes leave a larger footprint on the trained model. These observations point to vulnerability of machine learning models to attribute inference attacks in the update setting.

29.8QUANT-PHMay 20
Precision and Privacy in Distributed Quantum Sensing: A Quantum Fisher Information Duality

Farhad Farokhi

We establish a quantum Fisher information (QFI) duality for distributed quantum sensor networks with local phase encoding. For any $N$-qubit probe state, where $N$ denotes the number of sensors, $F_Q(\boldsymbol{w}^\top \boldsymbolθ) + F_Q(\boldsymbol{v}^\top \boldsymbolθ) \leq N$ for all unit orthogonal sensing directions $\boldsymbol{w}$ and $\boldsymbol{v}$, with equality for all equatorial states when $N=2$ and for Greenberger--Horne--Zeilinger (GHZ) states when $N\geq 2$. Heisenberg-limited precision for direction $\boldsymbol{w}$, $F_Q(\boldsymbol{w}^\top \boldsymbolθ)=N$, saturates the bound and simultaneously forces zero QFI for all other independent directions. This can be interpreted as the condition for parameter privacy in distributed quantum sensing: attaining Heisenberg-limited precision for the sensing target renders all alternative privacy-intrusive estimations impossible.

CRSep 21, 2025
Privacy-Preserving State Estimation with Crowd Sensors: An Information-Theoretic Respective

Farhad Farokhi

Privacy-preserving state estimation for linear time-invariant dynamical systems with crowd sensors is considered. At any time step, the estimator has access to measurements from a randomly selected sensor from a pool of sensors with pre-specified models and noise profiles. A Luenberger-like observer is used to fuse the measurements with the underlying model of the system to recursively generate the state estimates. An additive privacy-preserving noise is used to constrain information leakage. Information leakage is measured via mutual information between the identity of the sensors and the state estimate conditioned on the actual state of the system. This captures an omnipotent adversary that not only can access state estimates but can also gather direct high-quality state measurements. Any prescribed level of information leakage is shown to be achievable by appropriately selecting the variance of the privacy-preserving noise. Therefore, privacy-utility trade-off can be fine-tuned.

QUANT-PHApr 12, 2024
Optimal Universal Quantum Encoding for Statistical Inference

Farhad Farokhi

Optimal encoding of classical data for statistical inference using quantum computing is investigated. A universal encoder is sought that is optimal for a wide array of statistical inference tasks. Accuracy of any statistical inference is shown to be upper bounded by a term that is proportional to maximal quantum leakage from the classical data, i.e., the input to the inference model, through its quantum encoding. This demonstrates that the maximal quantum leakage is a universal measure of the quality of the encoding strategy for statistical inference as it only depends on the quantum encoding of the data and not the inference task itself. The optimal universal encoding strategy, i.e., the encoding strategy that maximizes the maximal quantum leakage, is proved to be attained by pure states. When there are enough qubits, basis encoding is proved to be universally optimal. An iterative method for numerically computing the optimal universal encoding strategy is presented.

LGNov 1, 2021
Learning Safety Filters for Unknown Discrete-Time Linear Systems

Farhad Farokhi, Alex S. Leong, Mohammad Zamani et al.

A learning-based safety filter is developed for discrete-time linear time-invariant systems with unknown models subject to Gaussian noises with unknown covariance. Safety is characterized using polytopic constraints on the states and control inputs. The empirically learned model and process noise covariance with their confidence bounds are used to construct a robust optimization problem for minimally modifying nominal control actions to ensure safety with high probability. The optimization problem relies on tightening the original safety constraints. The magnitude of the tightening is larger at the beginning since there is little information to construct reliable models, but shrinks with time as more data becomes available.

ROOct 11, 2021
Optimal Stochastic Evasive Maneuvers Using the Schrodinger's Equation

Farhad Farokhi, Magnus Egerstedt

In this paper, preys with stochastic evasion policies are considered. The stochasticity adds unpredictable changes to the prey's path for avoiding predator's attacks. The prey's cost function is composed of two terms balancing the unpredictability factor (by using stochasticity to make the task of forecasting its future positions by the predator difficult) and energy consumption (the least amount of energy required for performing a maneuver). The optimal probability density functions of the actions of the prey for trading-off unpredictability and energy consumption is shown to be characterized by the stationary Schrodinger's equation.

CRJun 18, 2021
Sharing in a Trustless World: Privacy-Preserving Data Analytics with Potentially Cheating Participants

Tham Nguyen, Hassan Jameel Asghar, Raghav Bhakar et al.

Lack of trust between organisations and privacy concerns about their data are impediments to an otherwise potentially symbiotic joint data analysis. We propose DataRing, a data sharing system that allows mutually mistrusting participants to query each others' datasets in a privacy-preserving manner while ensuring the correctness of input datasets and query answers even in the presence of (cheating) participants deviating from their true datasets. By relying on the assumption that if only a small subset of rows of the true dataset are known, participants cannot submit answers to queries deviating significantly from their true datasets. We employ differential privacy and a suite of cryptographic tools to ensure individual privacy for each participant's dataset and data confidentiality from the system. Our results show that the evaluation of 10 queries on a dataset with 10 attributes and 500,000 records is achieved in 90.63 seconds. DataRing could detect cheating participant that deviates from its true dataset in few queries with high accuracy.

LGMar 2, 2021
Safe Learning of Uncertain Environments

Farhad Farokhi, Alex Leong, Iman Shames et al.

In many learning based control methodologies, learning the unknown dynamic model precedes the control phase, while the aim is to control the system such that it remains in some safe region of the state space. In this work, our aim is to guarantee safety while learning and control proceed simultaneously. Specifically, we consider the problem of safe learning in nonlinear control-affine systems subject to unknown additive uncertainty. We first model the uncertainty as a Gaussian noise and use state measurements to learn its mean and covariance. We provide rigorous time-varying bounds on the mean and covariance of the uncertainty and employ them to modify the control input via an optimization program with potentially time-varying safety constraints. We show that with an arbitrarily large probability we can guarantee that the state will remain in the safe set, while learning and control are carried out simultaneously, provided that a feasible solution exists for the optimization problem. We provide a secondary formulation of this optimization that is computationally more efficient. This is based on tightening the safety constraints to counter the uncertainty about the learned mean and covariance. The magnitude of the tightening can be decreased as our confidence in the learned mean and covariance increases (i.e., as we gather more measurements about the environment). Extensions of the method are provided for non-Gaussian process noise with unknown mean and covariance as well as Gaussian uncertainties with state-dependent mean and covariance to accommodate more general environments.

ITJan 18, 2021
Optimal Pre-Processing to Achieve Fairness and Its Relationship with Total Variation Barycenter

Farhad Farokhi

We use disparate impact, i.e., the extent that the probability of observing an output depends on protected attributes such as race and gender, to measure fairness. We prove that disparate impact is upper bounded by the total variation distance between the distribution of the inputs given the protected attributes. We then use pre-processing, also known as data repair, to enforce fairness. We show that utility degradation, i.e., the extent that the success of a forecasting model changes by pre-processing the data, is upper bounded by the total variation distance between the distribution of the data before and after pre-processing. Hence, the problem of finding the optimal pre-processing regiment for enforcing fairness can be cast as minimizing total variations distance between the distribution of the data before and after pre-processing subject to a constraint on the total variation distance between the distribution of the inputs given protected attributes. This problem is a linear program that can be efficiently solved. We show that this problem is intimately related to finding the barycenter (i.e., center of mass) of two distributions when distances in the probability space are measured by total variation distance. We also investigate the effect of differential privacy on fairness using the proposed the total variation distances. We demonstrate the results using numerical experimentation with a practice dataset.

LGNov 30, 2020
Gradient Sparsification Can Improve Performance of Differentially-Private Convex Machine Learning

Farhad Farokhi

We use gradient sparsification to reduce the adverse effect of differential privacy noise on performance of private machine learning models. To this aim, we employ compressed sensing and additive Laplace noise to evaluate differentially-private gradients. Noisy privacy-preserving gradients are used to perform stochastic gradient descent for training machine learning models. Sparsification, achieved by setting the smallest gradient entries to zero, can reduce the convergence speed of the training algorithm. However, by sparsification and compressed sensing, the dimension of communicated gradient and the magnitude of additive noise can be reduced. The interplay between these effects determines whether gradient sparsification improves the performance of differentially-private machine learning models. We investigate this analytically in the paper. We prove that, for small privacy budgets, compression can improve performance of privacy-preserving machine learning models. However, for large privacy budgets, compression does not necessarily improve the performance. Intuitively, this is because the effect of privacy-preserving noise is minimal in large privacy budget regime and thus improvements from gradient sparsification cannot compensate for its slower convergence.

LGNov 24, 2020
When Machine Learning Meets Privacy: A Survey and Outlook

Bo Liu, Ming Ding, Sina Shaham et al.

The newly emerged machine learning (e.g. deep learning) methods have become a strong driving force to revolutionize a wide range of industries, such as smart healthcare, financial technology, and surveillance systems. Meanwhile, privacy has emerged as a big concern in this machine learning-based artificial intelligence era. It is important to note that the problem of privacy preservation in the context of machine learning is quite different from that in traditional data privacy protection, as machine learning can act as both friend and foe. Currently, the work on the preservation of privacy and machine learning (ML) is still in an infancy stage, as most existing solutions only focus on privacy problems during the machine learning process. Therefore, a comprehensive study on the privacy preservation problems and machine learning is required. This paper surveys the state of the art in privacy issues and solutions for machine learning. The survey covers three categories of interactions between privacy and machine learning: (i) private machine learning, (ii) machine learning aided privacy protection, and (iii) machine learning-based privacy attack and corresponding protection schemes. The current research progress in each category is reviewed and the key challenges are identified. Finally, based on our in-depth analysis of the area of privacy and machine learning, we point out future research directions in this field.

ITOct 20, 2020
Non-Stochastic Private Function Evaluation

Farhad Farokhi, Girish Nair

We consider private function evaluation to provide query responses based on private data of multiple untrusted entities in such a way that each cannot learn something substantially new about the data of others. First, we introduce perfect non-stochastic privacy in a two-party scenario. Perfect privacy amounts to conditional unrelatedness of the query response and the private uncertain variable of other individuals conditioned on the uncertain variable of a given entity. We show that perfect privacy can be achieved for queries that are functions of the common uncertain variable, a generalization of the common random variable. We compute the closest approximation of the queries that do not take this form. To provide a trade-off between privacy and utility, we relax the notion of perfect privacy. We define almost perfect privacy and show that this new definition equates to using conditional disassociation instead of conditional unrelatedness in the definition of perfect privacy. Then, we generalize the definitions to multi-party function evaluation (more than two data entities). We prove that uniform quantization of query responses, where the quantization resolution is a function of privacy budget and sensitivity of the query (cf., differential privacy), achieves function evaluation privacy.

STAug 28, 2020
Deconvoluting Kernel Density Estimation and Regression for Locally Differentially Private Data

Farhad Farokhi

Local differential privacy has become the gold-standard of privacy literature for gathering or releasing sensitive individual data points in a privacy-preserving manner. However, locally differential data can twist the probability density of the data because of the additive noise used to ensure privacy. In fact, the density of privacy-preserving data (no matter how many samples we gather) is always flatter in comparison with the density function of the original data points due to convolution with privacy-preserving noise density function. The effect is especially more pronounced when using slow-decaying privacy-preserving noises, such as the Laplace noise. This can result in under/over-estimation of the heavy-hitters. This is an important challenge facing social scientists due to the use of differential privacy in the 2020 Census in the United States. In this paper, we develop density estimation methods using smoothing kernels. We use the framework of deconvoluting kernel density estimators to remove the effect of privacy-preserving noise. This approach also allows us to adapt the results from non-parameteric regression with errors-in-variables to develop regression models based on locally differentially private data. We demonstrate the performance of the developed methods on financial and demographic datasets.

CRAug 11, 2020
Security Versus Privacy

Farhad Farokhi, Peyman Mohajerin Esfahani

Linear queries can be submitted to a server containing private data. The server provides a response to the queries systematically corrupted using an additive noise to preserve the privacy of those whose data is stored on the server. The measure of privacy is inversely proportional to the trace of the Fisher information matrix. It is assumed that an adversary can inject a false bias to the responses. The measure of the security, capturing the ease of detecting the presence of the false data injection, is the sensitivity of the Kullback-Leiber divergence to the additive bias. An optimization problem for balancing privacy and security is proposed and subsequently solved. It is shown that the level of guaranteed privacy times the level of security equals a constant. Therefore, by increasing the level of privacy, the security guarantees can only be weakened and vice versa. Similar results are developed under the differential privacy framework.

LGJun 24, 2020
Distributionally-Robust Machine Learning Using Locally Differentially-Private Data

Farhad Farokhi

We consider machine learning, particularly regression, using locally-differentially private datasets. The Wasserstein distance is used to define an ambiguity set centered at the empirical distribution of the dataset corrupted by local differential privacy noise. The ambiguity set is shown to contain the probability distribution of unperturbed, clean data. The radius of the ambiguity set is a function of the privacy budget, spread of the data, and the size of the problem. Hence, machine learning with locally-differentially private datasets can be rewritten as a distributionally-robust optimization. For general distributions, the distributionally-robust optimization problem can relaxed as a regularized machine learning problem with the Lipschitz constant of the machine learning model as a regularizer. For linear and logistic regression, this regularizer is the dual norm of the model parameters. For Gaussian data, the distributionally-robust optimization problem can be solved exactly to find an optimal regularizer. This approach results in an entirely new regularizer for training linear regression models. Training with this novel regularizer can be posed as a semi-definite program. Finally, the performance of the proposed distributionally-robust machine learning training is demonstrated on practical datasets.

OCJun 2, 2020
Online Stochastic Convex Optimization: Wasserstein Distance Variation

Iman Shames, Farhad Farokhi

Distributionally-robust optimization is often studied for a fixed set of distributions rather than time-varying distributions that can drift significantly over time (which is, for instance, the case in finance and sociology due to underlying expansion of economy and evolution of demographics). This motivates understanding conditions on probability distributions, using the Wasserstein distance, that can be used to model time-varying environments. We can then use these conditions in conjunction with online stochastic optimization to adapt the decisions. We considers an online proximal-gradient method to track the minimizers of expectations of smooth convex functions parameterised by a random variable whose probability distributions continuously evolve over time at a rate similar to that of the rate at which the decision maker acts. We revisit the concepts of estimation and tracking error inspired by systems and control literature and provide bounds for them under strong convexity, Lipschitzness of the gradient, and bounds on the probability distribution drift. Further, noting that computing projections for a general feasible sets might not be amenable to online implementation (due to computational constraints), we propose an exact penalty method. Doing so allows us to relax the uniform boundedness of the gradient and establish dynamic regret bounds for tracking and estimation error. We further introduce a constraint-tightening approach and relate the amount of tightening to the probability of satisfying the constraints.

ITApr 23, 2020
Measuring Information Leakage in Non-stochastic Brute-Force Guessing

Farhad Farokhi, Ni Ding

We propose an operational measure of information leakage in a non-stochastic setting to formalize privacy against a brute-force guessing adversary. We use uncertain variables, non-probabilistic counterparts of random variables, to construct a guessing framework in which an adversary is interested in determining private information based on uncertain reports. We consider brute-force trial-and-error guessing in which an adversary can potentially check all the possibilities of the private information that are compatible with the available outputs to find the actual private realization. The ratio of the worst-case number of guesses for the adversary in the presence of the output and in the absence of it captures the reduction in the adversary's guessing complexity and is thus used as a measure of private information leakage. We investigate the relationship between the newly-developed measure of information leakage with the existing non-stochastic maximin information and stochastic maximal leakage that are shown arise in one-shot guessing.

LGMar 18, 2020
The Cost of Privacy in Asynchronous Differentially-Private Machine Learning

Farhad Farokhi, Nan Wu, David Smith et al.

We consider training machine learning models using Training data located on multiple private and geographically-scattered servers with different privacy settings. Due to the distributed nature of the data, communicating with all collaborating private data owners simultaneously may prove challenging or altogether impossible. In this paper, we develop differentially-private asynchronous algorithms for collaboratively training machine-learning models on multiple private datasets. The asynchronous nature of the algorithms implies that a central learner interacts with the private data owners one-on-one whenever they are available for communication without needing to aggregate query responses to construct gradients of the entire fitness function. Therefore, the algorithm efficiently scales to many data owners. We define the cost of privacy as the difference between the fitness of a privacy-preserving machine-learning model and the fitness of trained machine-learning model in the absence of privacy concerns. We prove that we can forecast the performance of the proposed privacy-preserving asynchronous algorithms. We demonstrate that the cost of privacy has an upper bound that is inversely proportional to the combined size of the training datasets squared and the sum of the privacy budgets squared. We validate the theoretical results with experiments on financial and medical datasets. The experiments illustrate that collaboration among more than 10 data owners with at least 10,000 records with privacy budgets greater than or equal to 1 results in a superior machine-learning model in comparison to a model trained in isolation on only one of the datasets, illustrating the value of collaboration and the cost of the privacy. The number of the collaborating datasets can be lowered if the privacy budget is higher.

LGFeb 17, 2020
Data and Model Dependencies of Membership Inference Attack

Shakila Mahjabin Tonni, Dinusha Vatsalan, Farhad Farokhi et al.

Machine learning (ML) models have been shown to be vulnerable to Membership Inference Attacks (MIA), which infer the membership of a given data point in the target dataset by observing the prediction output of the ML model. While the key factors for the success of MIA have not yet been fully understood, existing defense mechanisms such as using L2 regularization \cite{10shokri2017membership} and dropout layers \cite{salem2018ml} take only the model's overfitting property into consideration. In this paper, we provide an empirical analysis of the impact of both the data and ML model properties on the vulnerability of ML techniques to MIA. Our results reveal the relationship between MIA accuracy and properties of the dataset and training model in use. In particular, we show that the size of shadow dataset, the class and feature balance and the entropy of the target dataset, the configurations and fairness of the training model are the most influential factors. Based on those experimental findings, we conclude that along with model overfitting, multiple properties jointly contribute to MIA success instead of any single property. Building on our experimental findings, we propose using those data and model properties as regularizers to protect ML models against MIA. Our results show that the proposed defense mechanisms can reduce the MIA accuracy by up to 25\% without sacrificing the ML model prediction utility.

LGJan 29, 2020
Regularization Helps with Mitigating Poisoning Attacks: Distributionally-Robust Machine Learning Using the Wasserstein Distance

Farhad Farokhi

We use distributionally-robust optimization for machine learning to mitigate the effect of data poisoning attacks. We provide performance guarantees for the trained model on the original data (not including the poison records) by training the model for the worst-case distribution on a neighbourhood around the empirical distribution (extracted from the training dataset corrupted by a poisoning attack) defined using the Wasserstein distance. We relax the distributionally-robust machine learning problem by finding an upper bound for the worst-case fitness based on the empirical sampled-averaged fitness and the Lipschitz-constant of the fitness function (on the data for given model parameters) as regularizer. For regression models, we prove that this regularizer is equal to the dual norm of the model parameters. We use the Wine Quality dataset, the Boston Housing Market dataset, and the Adult dataset for demonstrating the results of this paper.

LGJan 29, 2020
Modelling and Quantifying Membership Information Leakage in Machine Learning

Farhad Farokhi, Mohamed Ali Kaafar

Machine learning models have been shown to be vulnerable to membership inference attacks, i.e., inferring whether individuals' data have been used for training models. The lack of understanding about factors contributing success of these attacks motivates the need for modelling membership information leakage using information theory and for investigating properties of machine learning models and training algorithms that can reduce membership information leakage. We use conditional mutual information leakage to measure the amount of information leakage from the trained machine learning model about the presence of an individual in the training dataset. We devise an upper bound for this measure of information leakage using Kullback--Leibler divergence that is more amenable to numerical computation. We prove a direct relationship between the Kullback--Leibler membership information leakage and the probability of success for a hypothesis-testing adversary examining whether a particular data record belongs to the training dataset of a machine learning model. We show that the mutual information leakage is a decreasing function of the training dataset size and the regularization weight. We also prove that, if the sensitivity of the machine learning model (defined in terms of the derivatives of the fitness with respect to model parameters) is high, more membership information is potentially leaked. This illustrates that complex models, such as deep neural networks, are more susceptible to membership inference attacks in comparison to simpler models with fewer degrees of freedom. We show that the amount of the membership information leakage is reduced by $\mathcal{O}(\log^{1/2}(δ^{-1})ε^{-1})$ when using Gaussian $(ε,δ)$-differentially-private additive noises.

CRDec 29, 2019
Privacy-Preserving Public Release of Datasets for Support Vector Machine Classification

Farhad Farokhi

We consider the problem of publicly releasing a dataset for support vector machine classification while not infringing on the privacy of data subjects (i.e., individuals whose private information is stored in the dataset). The dataset is systematically obfuscated using an additive noise for privacy protection. Motivated by the Cramer-Rao bound, inverse of the trace of the Fisher information matrix is used as a measure of the privacy. Conditions are established for ensuring that the classifier extracted from the original dataset and the obfuscated one are close to each other (capturing the utility). The optimal noise distribution is determined by maximizing a weighted sum of the measures of privacy and utility. The optimal privacy-preserving noise is proved to achieve local differential privacy. The results are generalized to a broader class of optimization-based supervised machine learning algorithms. Applicability of the methodology is demonstrated on multiple datasets.

CRNov 12, 2019
Developing Non-Stochastic Privacy-Preserving Policies Using Agglomerative Clustering

Ni Ding, Farhad Farokhi

We consider a non-stochastic privacy-preserving problem in which an adversary aims to infer sensitive information $S$ from publicly accessible data $X$ without using statistics. We consider the problem of generating and releasing a quantization $\hat{X}$ of $X$ to minimize the privacy leakage of $S$ to $\hat{X}$ while maintaining a certain level of utility (or, inversely, the quantization loss). The variables $S$ and $S$ are treated as bounded and non-probabilistic, but are otherwise general. We consider two existing non-stochastic privacy measures, namely the maximum uncertainty reduction $L_0(S \rightarrow \hat{X})$ and the refined information $I_*(S; \hat{X})$ (also called the maximin information) of $S$. For each privacy measure, we propose a corresponding agglomerative clustering algorithm that converges to a locally optimal quantization solution $\hat{X}$ by iteratively merging elements in the alphabet of $X$. To instantiate the solution to this problem, we consider two specific utility measures, the worst-case resolution of $X$ by observing $\hat{X}$ and the maximal distortion of the released data $\hat{X}$. We show that the value of the maximin information $I_*(S; \hat{X})$ can be determined by dividing the confusability graph into connected subgraphs. Hence, $I_*(S; \hat{X})$ can be reduced by merging nodes connecting subgraphs. The relation to the probabilistic information-theoretic privacy is also studied by noting that the G{á}cs-K{ö}rner common information is the stochastic version of $I_*$ and indicates the attainability of statistical indistinguishability.

ITOct 29, 2019
Noiseless Privacy

Farhad Farokhi

In this paper, we define noiseless privacy, as a non-stochastic rival to differential privacy, requiring that the outputs of a mechanism (i.e., function composition of a privacy-preserving mapping and a query) can attain only a few values while varying the data of an individual (the logarithm of the number of the distinct values is bounded by the privacy budget). Therefore, the output of the mechanism is not fully informative of the data of the individuals in the dataset. We prove several guarantees for noiselessly-private mechanisms. The information content of the output about the data of an individual, even if an adversary knows all the other entries of the private dataset, is bounded by the privacy budget. The zero-error capacity of memory-less channels using noiselessly private mechanisms for transmission is upper bounded by the privacy budget. The performance of a non-stochastic hypothesis-testing adversary is bounded again by the privacy budget. Finally, assuming that an adversary has access to a stochastic prior on the dataset, we prove that the estimation error of the adversary for individual entries of the dataset is lower bounded by a decreasing function of the privacy budget. In this case, we also show that the maximal information leakage is bounded by the privacy budget. In addition to privacy guarantees, we prove that noiselessly-private mechanisms admit composition theorem and post-processing does not weaken their privacy guarantees. We prove that quantization operators can ensure noiseless privacy if the number of quantization levels is appropriately selected based on the sensitivity of the query and the privacy budget. Finally, we illustrate the privacy merits of noiseless privacy using multiple datasets in energy and transport.

CRSep 25, 2019
Differential Privacy for Evolving Almost-Periodic Datasets with Continual Linear Queries: Application to Energy Data Privacy

Farhad Farokhi

For evolving datasets with continual reports, the composition rule for differential privacy (DP) dictates that the scale of DP noise must grow linearly with the number of the queries, or that the privacy budget must be split equally between all the queries, so that the privacy budget across all the queries remains bounded and consistent with the privacy guarantees. To avoid this drawback of DP, we consider datasets containing almost periodic time series, composed of periodic components and noisy variations on top that are independent across periods. Our interest in these datasets is motivated by that, for reporting on private periodic time series, we do not need to divide the privacy budget across the entire, possibly infinite, horizon. Instead, for periodic time series, we generate DP reports for the first period and report the same DP reports periodically. In practice, however, exactly periodic time series do not exist as the data always contains small variations due to random or uncertain events. For instance, the energy consumption of a household may repeat the same daily pattern with slight variations due to minor changes to the habits of the individuals. The underlying periodic pattern is a function of the private information of the households. It might be desired to protect the privacy of households by not leaking information about the recurring patterns while the individual daily variations are almost noise-like with little to no privacy concerns (depending on the situation). Motivated by this, we define DP for almost periodic datasets and develop a Laplace mechanism for responding to linear queries. We provide statistical tools for testing the validity of almost periodicity assumption. We use multiple energy datasets containing smart-meter measurements of households to validate almost periodicity assumption. We generate DP aggregate reports and investigate their utility.

CRAug 14, 2019
Taking a Lesson from Quantum Particles for Statistical Data Privacy

Farhad Farokhi

Privacy is under threat from artificial intelligence revolution fueled by unprecedented abundance of data. Differential privacy, an established candidate for privacy protection, is susceptible to adversarial attacks, acts conservatively, and leads to miss-implementations because of lacking systematic methods for setting its parameters (known as the privacy budget). An alternative is information-theoretic privacy using entropy with the drawback of requiring prior distribution of the private data. Here, by using the Fisher information, information-theoretic privacy framework is extended to avoid unnecessary assumptions on the private data. The optimal privacy-preserving additive noise, extracted by minimizing the Fisher information, must follow the time-independent Schrodinger's equation. A fundamental trade-off between privacy and utility is also proved, reminiscent of the Heisenberg uncertainty principle.

CRAug 12, 2019
Temporally Discounted Differential Privacy for Evolving Datasets on an Infinite Horizon

Farhad Farokhi

We define discounted differential privacy, as an alternative to (conventional) differential privacy, to investigate privacy of evolving datasets, containing time series over an unbounded horizon. We use privacy loss as a measure of the amount of information leaked by the reports at a certain fixed time. We observe that privacy losses are weighted equally across time in the definition of differential privacy, and therefore the magnitude of privacy-preserving additive noise must grow without bound to ensure differential privacy over an infinite horizon. Motivated by the discounted utility theory within the economics literature, we use exponential and hyperbolic discounting of privacy losses across time to relax the definition of differential privacy under continual observations. This implies that privacy losses in distant past are less important than the current ones to an individual. We use discounted differential privacy to investigate privacy of evolving datasets using additive Laplace noise and show that the magnitude of the additive noise can remain bounded under discounted differential privacy. We illustrate the quality of privacy-preserving mechanisms satisfying discounted differential privacy on smart-meter measurement time-series of real households, made publicly available by Ausgrid (an Australian electricity distribution company).

CRJun 24, 2019
A Game-Theoretic Approach to Adversarial Linear Support Vector Classification

Farhad Farokhi

In this paper, we employ a game-theoretic model to analyze the interaction between an adversary and a classifier. There are two classes (i.e., positive and negative classes) to which data points can belong. The adversary is interested in maximizing the probability of miss-detection for the positive class (i.e., false negative probability). The adversary however does not want to significantly modify the data point so that it still maintains favourable traits of the original class. The classifier, on the other hand, is interested in maximizing the probability of correct detection for the positive class (i.e., true positive probability) subject to a lower-bound on the probability of correct detection for the negative class (i.e., true negative probability). For conditionally Gaussian data points (conditioned on the class) and linear support vector machine classifiers, we rewrite the optimization problems of the adversary and the classifier as convex optimization problems and use best response dynamics to learn an equilibrium of the game. This results in computing a linear support vector machine classifier that is robust against adversarial input manipulations. We illustrate the framework on a synthetic dataset and a public Cardiovascular Disease dataset.

CRJun 24, 2019
The Value of Collaboration in Convex Machine Learning with Differential Privacy

Nan Wu, Farhad Farokhi, David Smith et al.

In this paper, we apply machine learning to distributed private data owned by multiple data owners, entities with access to non-overlapping training datasets. We use noisy, differentially-private gradients to minimize the fitness cost of the machine learning model using stochastic gradient descent. We quantify the quality of the trained model, using the fitness cost, as a function of privacy budget and size of the distributed datasets to capture the trade-off between privacy and utility in machine learning. This way, we can predict the outcome of collaboration among privacy-aware data owners prior to executing potentially computationally-expensive machine learning algorithms. Particularly, we show that the difference between the fitness of the trained machine learning model using differentially-private gradient queries and the fitness of the trained machine model in the absence of any privacy concerns is inversely proportional to the size of the training datasets squared and the privacy budget squared. We successfully validate the performance prediction with the actual performance of the proposed privacy-aware learning algorithms, applied to: financial datasets for determining interest rates of loans using regression; and detecting credit card frauds using support vector machines.

CRFeb 19, 2019
Implementing Homomorphic Encryption Based Secure Feedback Control for Physical Systems

Julian Tran, Farhad Farokhi, Michael Cantoni et al.

This paper is about an encryption based approach to the secure implementation of feedback controllers for physical systems. Specifically, Paillier's homomorphic encryption is used to digitally implement a class of linear dynamic controllers, which includes the commonplace static gain and PID type feedback control laws as special cases. The developed implementation is amenable to Field Programmable Gate Array (FPGA) realization. Experimental results, including timing analysis and resource usage characteristics for different encryption key lengths, are presented for the realization of an inverted pendulum controller; as this is an unstable plant, the control is necessarily fast.

ITApr 16, 2019
Non-Stochastic Hypothesis Testing with Application to Privacy Against Hypothesis-Testing Adversary

Farhad Farokhi

In this paper, we consider privacy against hypothesis testing adversaries within a non-stochastic framework. We develop a theory of non-stochastic hypothesis testing by borrowing the notion of uncertain variables from non-stochastic information theory. We define tests as binary-valued mappings on uncertain variables and prove a fundamental bound on the best performance of tests in non-stochastic hypothesis testing. We use this bound to develop a measure of privacy. We then construct reporting policies with prescribed privacy and utility guarantees. The utility of a reporting policy is measured by the distance between the reported and original values. We illustrate the effects of using such privacy-preserving reporting polices on a publicly-available practical dataset of preferences and demographics of young individuals, aged between 15-30, with Slovakian nationality.

OCDec 11, 2018
Secure and Private Implementation of Dynamic Controllers Using Semi-Homomorphic Encryption

Carlos Murguia, Farhad Farokhi, Iman Shames

This paper presents a secure and private implementation of linear time-invariant dynamic controllers using Paillier's encryption, a semi-homomorphic encryption method. To avoid overflow or underflow within the encryption domain, the state of the controller is reset periodically. A control design approach is presented to ensure stability and optimize performance of the closed-loop system with encrypted controller.

ITOct 26, 2018
Development and Analysis of Deterministic Privacy-Preserving Policies Using Non-Stochastic Information Theory

Farhad Farokhi

A deterministic privacy metric using non-stochastic information theory is developed. Particularly, minimax information is used to construct a measure of information leakage, which is inversely proportional to the measure of privacy. Anyone can submit a query to a trusted agent with access to a non-stochastic uncertain private dataset. Optimal deterministic privacy-preserving policies for responding to the submitted query are computed by maximizing the measure of privacy subject to a constraint on the worst-case quality of the response (i.e., the worst-case difference between the response by the agent and the output of the query computed on the private dataset). The optimal privacy-preserving policy is proved to be a piecewise constant function in the form of a quantization operator applied on the output of the submitted query. The measure of privacy is also used to analyze the performance of $k$-anonymity methodology (a popular deterministic mechanism for privacy-preserving release of datasets using suppression and generalization techniques), proving that it is in fact not privacy-preserving.

SYSep 10, 2018
On Privacy of Quantized Sensor Measurements through Additive Noise

Carlos Murguia, Iman Shames, Farhad Farokhi et al.

We study the problem of maximizing privacy of quantized sensor measurements by adding random variables. In particular, we consider the setting where information about the state of a process is obtained using noisy sensor measurements. This information is quantized and sent to a remote station through an unsecured communication network. It is desired to keep the state of the process private; however, because the network is not secure, adversaries might have access to sensor information, which could be used to estimate the process state. To avoid an accurate state estimation, we add random numbers to the quantized sensor measurements and send the sum to the remote station instead. The distribution of these random variables is designed to minimize the mutual information between the sum and the quantized sensor measurements for a desired level of distortion -- how different the sum and the quantized sensor measurements are allowed to be. Simulations are presented to illustrate our results.

SYJul 24, 2017
Integration of Information Patterns in the Modeling and Design of Mobility Management Services

Alexander Keimer, Nicolas Laurent-Brouty, Farhad Farokhi et al.

Over the last decade, the rise of the mobile internet and the usage of mobile devices has enabled ubiquitous traffic information. With the increased adoption of specific smartphone applications, the number of users of routing applications has become large enough to disrupt traffic flow patterns in a significant manner. Similarly, but at a slightly slower pace, novel services for freight transportation and city logistics improve the efficiency of goods transportation and change the use of road infrastructure. The present article provides a general four-layer framework for modeling these new trends. The main motivation behind the development is to provide a unifying formal system description that can at the same time encompass system physics (flow and motion of vehicles) as well as coordination strategies under various information and cooperation structures. To showcase the framework, we apply it to the specific challenge of modeling and analyzing the integration of routing applications in today's transportation systems. In this framework, at the lowest layer (flow dynamics) we distinguish app users from non-app users. A distributed parameter model based on a non-local partial differential equation is introduced and analyzed. The second layer incorporates connected services (e.g., routing) and other applications used to optimize the local performance of the system. As inputs to those applications, we propose a third layer introducing the incentive design and global objectives, which are typically varying over the day depending on road and weather conditions, external events etc. The high-level planning is handled on the fourth layer taking social long-term objectives into account.

OCJun 6, 2017
Preserving Privacy of Finite Impulse Response Systems

Giulio Bottegal, Farhad Farokhi, Iman Shames

Adding input and output noises for increasing model identification error of finite impulse response (FIR) systems is considered. This is motivated by the desire to protect the model of the system as a trade secret by rendering model identification techniques ineffective. Optimal filters for constructing additive noises that maximizes the identification error subject to maintaining the closed-loop performance degradation below a limit are constructed. Furthermore, differential privacy is used for designing output noises that preserve the privacy of the model.

SYFeb 28, 2017
Private and Secure Coordination of Match-Making for Heavy-Duty Vehicle Platooning

Farhad Farokhi, Iman Shames, Karl H. Johansson

A secure and private framework for inter-agent communication and coordination is developed. This allows an agent, in our case a fleet owner, to ask questions or submit queries in an encrypted fashion using semi-homomorphic encryption. The submitted query can be about the interest of the other fleet owners for using a road at a specific time of the day, for instance, for the purpose of collaborative vehicle platooning. The other agents can then provide appropriate responses without knowing the content of the questions or the queries. Strong privacy and security guarantees are provided for the agent who is submitting the queries. It is also shown that the amount of the information that this agent can extract from the other agent is bounded. In fact, with submitting one query, a sophisticated agent can at most extract the answer to two queries. This secure communication platform is used subsequently to develop a distributed coordination mechanisms among fleet owners.