Marten van Dijk

LG
h-index21
38papers
1,082citations
Novelty52%
AI Score56

38 Papers

LGNov 26, 2022
Game Theoretic Mixed Experts for Combinational Adversarial Machine Learning

Ethan Rathbun, Kaleel Mahmood, Sohaib Ahmad et al.

Recent advances in adversarial machine learning have shown that defenses considered to be robust are actually susceptible to adversarial attacks which are specifically customized to target their weaknesses. These defenses include Barrage of Random Transforms (BaRT), Friendly Adversarial Training (FAT), Trash is Treasure (TiT) and ensemble models made up of Vision Transformers (ViTs), Big Transfer models and Spiking Neural Networks (SNNs). We first conduct a transferability analysis, to demonstrate the adversarial examples generated by customized attacks on one defense, are not often misclassified by another defense. This finding leads to two important questions. First, how can the low transferability between defenses be utilized in a game theoretic framework to improve the robustness? Second, how can an adversary within this framework develop effective multi-model attacks? In this paper, we provide a game-theoretic framework for ensemble adversarial attacks and defenses. Our framework is called Game theoretic Mixed Experts (GaME). It is designed to find the Mixed-Nash strategy for both a detector based and standard defender, when facing an attacker employing compositional adversarial attacks. We further propose three new attack algorithms, specifically designed to target defenses with randomized transformations, multi-model voting schemes, and adversarial detector architectures. These attacks serve to both strengthen defenses generated by the GaME framework and verify their robustness against unforeseen attacks. Overall, our framework and analyses advance the field of adversarial machine learning by yielding new insights into compositional attack and defense formulations.

LGJul 21, 2023
Batch Clipping and Adaptive Layerwise Clipping for Differential Private Stochastic Gradient Descent

Toan N. Nguyen, Phuong Ha Nguyen, Lam M. Nguyen et al.

Each round in Differential Private Stochastic Gradient Descent (DPSGD) transmits a sum of clipped gradients obfuscated with Gaussian noise to a central server which uses this to update a global model which often represents a deep neural network. Since the clipped gradients are computed separately, which we call Individual Clipping (IC), deep neural networks like resnet-18 cannot use Batch Normalization Layers (BNL) which is a crucial component in deep neural networks for achieving a high accuracy. To utilize BNL, we introduce Batch Clipping (BC) where, instead of clipping single gradients as in the orginal DPSGD, we average and clip batches of gradients. Moreover, the model entries of different layers have different sensitivities to the added Gaussian noise. Therefore, Adaptive Layerwise Clipping methods (ALC), where each layer has its own adaptively finetuned clipping constant, have been introduced and studied, but so far without rigorous DP proofs. In this paper, we propose {\em a new ALC and provide rigorous DP proofs for both BC and ALC}. Experiments show that our modified DPSGD with BC and ALC for CIFAR-$10$ with resnet-$18$ converges while DPSGD with IC and ALC does not.

LGMar 8, 2023
Considerations on the Theory of Training Models with Differential Privacy

Marten van Dijk, Phuong Ha Nguyen

In federated learning collaborative learning takes place by a set of clients who each want to remain in control of how their local training data is used, in particular, how can each client's local training data remain private? Differential privacy is one method to limit privacy leakage. We provide a general overview of its framework and provable properties, adopt the more recent hypothesis based definition called Gaussian DP or $f$-DP, and discuss Differentially Private Stochastic Gradient Descent (DP-SGD). We stay at a meta level and attempt intuitive explanations and insights \textit{in this book chapter}.

LGDec 12, 2022
Generalizing DP-SGD with Shuffling and Batch Clipping

Marten van Dijk, Phuong Ha Nguyen, Toan N. Nguyen et al.

Classical differential private DP-SGD implements individual clipping with random subsampling, which forces a mini-batch SGD approach. We provide a general differential private algorithmic framework that goes beyond DP-SGD and allows any possible first order optimizers (e.g., classical SGD and momentum based SGD approaches) in combination with batch clipping, which clips an aggregate of computed gradients rather than summing clipped gradients (as is done in individual clipping). The framework also admits sampling techniques beyond random subsampling such as shuffling. Our DP analysis follows the $f$-DP approach and introduces a new proof technique which allows us to derive simple closed form expressions and to also analyse group privacy. In particular, for $E$ epochs work and groups of size $g$, we show a $\sqrt{g E}$ DP dependency for batch clipping with shuffling.

CRApr 21
Function Recovery Attacks in Gate-Hiding Garbled Circuits using SAT Solving

Chao Yin, Zunchen Huang, Chenglu Jin et al.

Semi-Private Function Evaluation (SPFE) enables joint computation while protecting both input data and the function itself. A practical instantiation is gate-hiding garbled circuits, which conceal gate functionalities while revealing circuit topology. Existing security definitions intentionally exclude leakage through topology, leaving its concrete impact on function privacy largely unexplored. We present a SAT-based function-recovery attack that reconstructs hidden gate operations from a circuit's public topology under two attacker knowledge models. Our approach combines topology-preserving simplification theorems with a decomposition of the recovery task into smaller SAT queries, thereby reducing the candidate gate-type assignment space and improving recovery performance. We evaluate the attack on ISCAS benchmarks, representative secure computation circuits, and fault-tolerant sensor fusion circuits under a 24-hour recovery budget. Compared to a baseline attack, the optimized version substantially reduces recovery time and, in some cases, completes recovery within the evaluation budget where the baseline does not. Our results show that revealing circuit topology can materially assist recovery of hidden gate functionality, identifying topology as a security-relevant leakage channel in gate-hiding garbled circuits.

LGJan 15
Fundamental Limitations of Favorable Privacy-Utility Guarantees for DP-SGD

Murat Bilgehan Ertan, Marten van Dijk

Differentially Private Stochastic Gradient Descent (DP-SGD) is the dominant paradigm for private training, but its fundamental limitations under worst-case adversarial privacy definitions remain poorly understood. We analyze DP-SGD in the $f$-differential privacy framework, which characterizes privacy via hypothesis-testing trade-off curves, and study shuffled sampling over a single epoch with $M$ gradient updates. We derive an explicit suboptimal upper bound on the achievable trade-off curve. This result induces a geometric lower bound on the separation $κ$ which is the maximum distance between the mechanism's trade-off curve and the ideal random-guessing line. Because a large separation implies significant adversarial advantage, meaningful privacy requires small $κ$. However, we prove that enforcing a small separation imposes a strict lower bound on the Gaussian noise multiplier $σ$, which directly limits the achievable utility. In particular, under the standard worst-case adversarial model, shuffled DP-SGD must satisfy $σ\ge \frac{1}{\sqrt{2\ln M}}$ $\quad\text{or}\quad$ $κ\ge\ \frac{1}{\sqrt{8}}\!\left(1-\frac{1}{\sqrt{4π\ln M}}\right)$, and thus cannot simultaneously achieve strong privacy and high utility. Although this bound vanishes asymptotically as $M \to \infty$, the convergence is extremely slow: even for practically relevant numbers of updates the required noise magnitude remains substantial. We further show that the same limitation extends to Poisson subsampling up to constant factors. Our experiments confirm that the noise levels implied by this bound leads to significant accuracy degradation at realistic training settings, thus showing a critical bottleneck in DP-SGD under standard worst-case adversarial assumptions.

LGMay 7
PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization

Murat Bilgehan Ertan, Xiaochen Zhu, Phuong Ha Nguyen et al.

We introduce PACZero, a family of PAC-private zeroth-order mechanisms for fine-tuning large language models that delivers usable utility at $I(S^*; Y_{1:T})=0$. This privacy regime bounds the membership-inference attack (MIA) posterior success rate at the prior, an MIA-resistance level the DP framework matches only at $\varepsilon=0$ and infinite noise. All DP-ZO comparisons below are matched at the MIA posterior level. The key insight is that PAC Privacy charges mutual information only when the release depends on which candidate subset is the secret. Sign-quantizing subset-aggregated zeroth-order gradients creates frequent unanimity, steps at which every candidate subset agrees on the update direction; at these steps the released sign costs zero conditional mutual information. We propose two variants that span the privacy-utility trade-off: PACZero-MI (budgeted MI via exact calibration on the binary release) and PACZero-ZPL ($I=0$ via a uniform coin flip on disagreement steps). We evaluate on SST-2 and SQuAD with OPT-1.3B and OPT-6.7B in both LoRA and full-parameter tracks. On SST-2 OPT-1.3B full fine-tuning at $I=0$, PACZero-ZPL reaches ${88.99\pm0.91}$, within $2.1$pp of the non-private MeZO baseline ($91.1$ FT). No prior method produces usable utility in the high-privacy regime $\varepsilon<1$, and PACZero-ZPL obtains competitive SST-2 accuracy and nontrivial SQuAD F1 across OPT-1.3B and OPT-6.7B at $I=0$.

LGMay 7
Trade-off Functions for DP-SGD with Subsampling based on Random Shuffling: Tight Upper and Lower Bounds

Marten van Dijk, Murat Bilgehan Ertan

We derive a tight analysis of the trade-off function for Differentially Private Stochastic Gradient Descent (DP-SGD) with subsampling based on random shuffling within the $f$-DP framework. Our analysis covers the regime $σ\geq \sqrt{3/\ln M}$, where $σ$ is the noise multiplier and $M$ is the number of rounds within a single epoch. Unlike $f$-DP analyses for Poisson subsampling, which yield non-closed implicit formulas that can be machine computed but are non-transparent, random shuffling admits a tight analysis yielding transparent and interpretable closed-form bounds. Our concrete bounds, derived via the Berry-Esseen theorem, are tight up to constant factors within the proof framework. We demonstrate worked parameter settings for a single epoch ($E=1$) with a corresponding trade-off function $\geq 1-a-δ$, that is, only $δ$ below the ideal random guessing diagonal $1-a$: For $δ= 1/100$ and $σ= 1$, roughly $M \approx 1.14\times 10^6$ rounds and $N \approx 1.14\times 10^7$ training samples suffice to achieve meaningful differential privacy. This is in contrast to recent negative results for the regime $σ\leq 1/\sqrt{2 \ln M}$. Our concrete bounds can be composed over multiple epochs leading to $δ$ having a linear in $E$ dependency, which restricts $E=O(\sqrt{M})$. To go beyond Berry--Esseen, we introduce a new proof technique based on a generalization of the law of large numbers that yields an asymptotic random guessing diagonal-limit result: if $E=c_M^2M$ with $c_M\to 0$, then the $E$-fold composed trade-off function satisfies $f^{\otimes E}(a)\to 1-a$ uniformly in $a\in[0,1]$ with $δ$ having only an $O(\sqrt{E})$ dependency. We compare this asymptotic regime with the corresponding Poisson subsampling asymptotic, and highlight the characterization of explicit convergence rates as an open question.

LGMar 12, 2024
DP-TLDM: Differentially Private Tabular Latent Diffusion Model

Chaoyi Zhu, Jiayi Tang, Juan F. Pérez et al.

Synthetic data from generative models emerges as the privacy-preserving data sharing solution. Such a synthetic data set shall resemble the original data without revealing identifiable private information. Till date, the prior focus on limited types of tabular synthesizers and a small number of privacy attacks, particularly on Generative Adversarial Networks, and overlooks membership inference attacks and defense strategies, i.e., differential privacy. Motivated by the conundrum of keeping high data quality and low privacy risk of synthetic data tables, we propose DPTLDM, Differentially Private Tabular Latent Diffusion Model, which is composed of an autoencoder network to encode the tabular data and a latent diffusion model to synthesize the latent tables. Following the emerging f-DP framework, we apply DP-SGD to train the auto-encoder in combination with batch clipping and use the separation value as the privacy metric to better capture the privacy gain from DP algorithms. Our empirical evaluation demonstrates that DPTLDM is capable of achieving a meaningful theoretical privacy guarantee while also significantly enhancing the utility of synthetic data. Specifically, compared to other DP-protected tabular generative models, DPTLDM improves the synthetic quality by an average of 35% in data resemblance, 15% in the utility for downstream tasks, and 50% in data discriminability, all while preserving a comparable level of privacy risk.

CRJan 19
On the Evidentiary Limits of Membership Inference for Copyright Auditing

Murat Bilgehan Ertan, Emirhan Böge, Min Chen et al.

As large language models (LLMs) are trained on increasingly opaque corpora, membership inference attacks (MIAs) have been proposed to audit whether copyrighted texts were used during training, despite growing concerns about their reliability under realistic conditions. We ask whether MIAs can serve as admissible evidence in adversarial copyright disputes where an accused model developer may obfuscate training data while preserving semantic content, and formalize this setting through a judge-prosecutor-accused communication protocol. To test robustness under this protocol, we introduce SAGE (Structure-Aware SAE-Guided Extraction), a paraphrasing framework guided by Sparse Autoencoders (SAEs) that rewrites training data to alter lexical structure while preserving semantic content and downstream utility. Our experiments show that state-of-the-art MIAs degrade when models are fine-tuned on SAGE-generated paraphrases, indicating that their signals are not robust to semantics-preserving transformations. While some leakage remains in certain fine-tuning regimes, these results suggest that MIAs are brittle in adversarial settings and insufficient, on their own, as a standalone mechanism for copyright auditing of LLMs.

CVApr 23, 2025
Beyond Anonymization: Object Scrubbing for Privacy-Preserving 2D and 3D Vision Tasks

Murat Bilgehan Ertan, Ronak Sahu, Phuong Ha Nguyen et al.

We introduce ROAR (Robust Object Removal and Re-annotation), a scalable framework for privacy-preserving dataset obfuscation that eliminates sensitive objects instead of modifying them. Our method integrates instance segmentation with generative inpainting to remove identifiable entities while preserving scene integrity. Extensive evaluations on 2D COCO-based object detection show that ROAR achieves 87.5% of the baseline detection average precision (AP), whereas image dropping achieves only 74.2% of the baseline AP, highlighting the advantage of scrubbing in preserving dataset utility. The degradation is even more severe for small objects due to occlusion and loss of fine-grained details. Furthermore, in NeRF-based 3D reconstruction, our method incurs a PSNR loss of at most 1.66 dB while maintaining SSIM and improving LPIPS, demonstrating superior perceptual quality. Our findings establish object removal as an effective privacy framework, achieving strong privacy guarantees with minimal performance trade-offs. The results highlight key challenges in generative inpainting, occlusion-robust segmentation, and task-specific scrubbing, setting the foundation for future advancements in privacy-preserving vision systems.

LGFeb 7, 2022
Finite-Sum Optimization: A New Perspective for Convergence to a Global Solution

Lam M. Nguyen, Trang H. Tran, Marten van Dijk

Deep neural networks (DNNs) have shown great success in many machine learning tasks. Their training is challenging since the loss surface of the network architecture is generally non-convex, or even non-smooth. How and under what assumptions is guaranteed convergence to a \textit{global} minimum possible? We propose a reformulation of the minimization problem allowing for a new recursive algorithmic framework. By using bounded style assumptions, we prove convergence to an $\varepsilon$-(global) minimum using $\mathcal{\tilde{O}}(1/\varepsilon^3)$ gradient computations. Our theoretical foundation motivates further study, implementation, and optimization of the new algorithmic framework and further investigation of its non-standard bounded style assumptions. This new direction broadens our understanding of why and under what circumstances training of a DNN converges to a global minimum.

CRJan 5, 2022
Secure Remote Attestation with Strong Key Insulation Guarantees

Deniz Gurevin, Chenglu Jin, Phuong Ha Nguyen et al.

Recent years have witnessed a trend of secure processor design in both academia and industry. Secure processors with hardware-enforced isolation can be a solid foundation of cloud computation in the future. However, due to recent side-channel attacks, the commercial secure processors failed to deliver the promises of a secure isolated execution environment. Sensitive information inside the secure execution environment always gets leaked via side channels. This work considers the most powerful software-based side-channel attackers, i.e., an All Digital State Observing (ADSO) adversary who can observe all digital states, including all digital states in secure enclaves. Traditional signature schemes are not secure in ADSO adversarial model. We introduce a new cryptographic primitive called One-Time Signature with Secret Key Exposure (OTS-SKE), which ensures no one can forge a valid signature of a new message or nonce even if all secret session keys are leaked. OTS-SKE enables us to sign attestation reports securely under the ADSO adversary. We also minimize the trusted computing base by introducing a secure co-processor into the system, and the interaction between the secure co-processor and the attestation processor is unidirectional. That is, the co-processor takes no inputs from the processor and only generates secret keys for the processor to fetch. Our experimental results show that the signing of OTS-SKE is faster than that of Elliptic Curve Digital Signature Algorithm (ECDSA) used in Intel SGX.

CRSep 29, 2021
Back in Black: A Comparative Evaluation of Recent State-Of-The-Art Black-Box Attacks

Kaleel Mahmood, Rigel Mahmood, Ethan Rathbun et al.

The field of adversarial machine learning has experienced a near exponential growth in the amount of papers being produced since 2018. This massive information output has yet to be properly processed and categorized. In this paper, we seek to help alleviate this problem by systematizing the recent advances in adversarial machine learning black-box attacks since 2019. Our survey summarizes and categorizes 20 recent black-box attacks. We also present a new analysis for understanding the attack success rate with respect to the adversarial model used in each paper. Overall, our paper surveys a wide body of literature to highlight recent attack developments and organizes them into four attack categories: score based attacks, decision based attacks, transfer attacks and non-traditional attacks. Further, we provide a new mathematical framework to show exactly how attack results can fairly be compared.

CVMar 31, 2021
On the Robustness of Vision Transformers to Adversarial Examples

Kaleel Mahmood, Rigel Mahmood, Marten van Dijk

Recent advances in attention-based networks have shown that Vision Transformers can achieve state-of-the-art or near state-of-the-art results on many image classification tasks. This puts transformers in the unique position of being a promising alternative to traditional convolutional neural networks (CNNs). While CNNs have been carefully studied with respect to adversarial attacks, the same cannot be said of Vision Transformers. In this paper, we study the robustness of Vision Transformers to adversarial examples. Our analyses of transformer security is divided into three parts. First, we test the transformer under standard white-box and black-box attacks. Second, we study the transferability of adversarial examples between CNNs and transformers. We show that adversarial examples do not readily transfer between CNNs and transformers. Based on this finding, we analyze the security of a simple ensemble defense of CNNs and transformers. By creating a new attack, the self-attention blended gradient attack, we show that such an ensemble is not secure under a white-box adversary. However, under a black-box adversary, we show that an ensemble can achieve unprecedented robustness without sacrificing clean accuracy. Our analysis for this work is done using six types of white-box attacks and two types of black-box attacks. Our study encompasses multiple Vision Transformers, Big Transfer Models and CNN architectures trained on CIFAR-10, CIFAR-100 and ImageNet.

LGFeb 17, 2021
Proactive DP: A Multple Target Optimization Framework for DP-SGD

Marten van Dijk, Nhuong V. Nguyen, Toan N. Nguyen et al.

We introduce a multiple target optimization framework for DP-SGD referred to as pro-active DP. In contrast to traditional DP accountants, which are used to track the expenditure of privacy budgets, the pro-active DP scheme allows one to a-priori select parameters of DP-SGD based on a fixed privacy budget (in terms of $ε$ and $δ$) in such a way to optimize the anticipated utility (test accuracy) the most. To achieve this objective, we first propose significant improvements to the moment account method, presenting a closed-form $(ε,δ)$-DP guarantee that connects all parameters in the DP-SGD setup. We show that DP-SGD is $(ε<0.5,δ=1/N)$-DP if $σ=\sqrt{2(ε+\ln(1/δ))/ε}$ with $T$ at least $\approx 2k^2/ε$ and $(2/e)^2k^2-1/2\geq \ln(N)$, where $T$ is the total number of rounds, and $K=kN$ is the total number of gradient computations where $k$ measures $K$ in number of epochs of size $N$ of the local data set. We prove that our expression is close to tight in that if $T$ is more than a constant factor $\approx 4$ smaller than the lower bound $\approx 2k^2/ε$, then the $(ε,δ)$-DP guarantee is violated. The above DP guarantee can be enhanced in thatDP-SGD is $(ε, δ)$-DP if $σ= \sqrt{2(ε+\ln(1/δ))/ε}$ with $T$ at least $\approx 2k^2/ε$ together with two additional, less intuitive, conditions that allow larger $ε\geq 0.5$. Our DP theory allows us to create a utility graph and DP calculator. These tools link privacy and utility objectives and search for optimal experiment setups, efficiently taking into account both accuracy and privacy objectives, as well as implementation goals. We furnish a comprehensive implementation flow of our proactive DP, with rigorous experiments to showcase the proof-of-concept.

LGOct 27, 2020
Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

Marten van Dijk, Nhuong V. Nguyen, Toan N. Nguyen et al.

Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets in a heterogeneous way -- and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central "aggregator" which mimics Hogwild!. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). We improve state-of-the-art literature and show $O(\sqrt{K}$) communication rounds for heterogeneous data for strongly convex problems, where $K$ is the total number of gradient computations across all local compute nodes. For our scheme, we prove a \textit{tight} and novel non-trivial convergence analysis for strongly convex problems for {\em heterogeneous} data which does not use the bounded gradient assumption as seen in many existing publications. The tightness is a consequence of our proofs for lower and upper bounds of the convergence rate, which show a constant factor difference. We show experimental results for plain convex and non-convex problems for biased (i.e., heterogeneous) and unbiased local data sets.

LGJul 17, 2020
Asynchronous Federated Learning with Reduced Number of Rounds and with Differential Privacy from Less Aggregated Gaussian Noise

Marten van Dijk, Nhuong V. Nguyen, Toan N. Nguyen et al.

The feasibility of federated learning is highly constrained by the server-clients infrastructure in terms of network communication. Most newly launched smartphones and IoT devices are equipped with GPUs or sufficient computing hardware to run powerful AI models. However, in case of the original synchronous federated learning, client devices suffer waiting times and regular communication between clients and server is required. This implies more sensitivity to local model training times and irregular or missed updates, hence, less or limited scalability to large numbers of clients and convergence rates measured in real time will suffer. We propose a new algorithm for asynchronous federated learning which eliminates waiting times and reduces overall network communication - we provide rigorous theoretical analysis for strongly convex objective functions and provide simulation results. By adding Gaussian noise we show how our algorithm can be made differentially private -- new theorems show how the aggregated added Gaussian noise is significantly reduced.

LGJun 18, 2020
Beware the Black-Box: on the Robustness of Recent Defenses to Adversarial Examples

Kaleel Mahmood, Deniz Gurevin, Marten van Dijk et al.

Many defenses have recently been proposed at venues like NIPS, ICML, ICLR and CVPR. These defenses are mainly focused on mitigating white-box attacks. They do not properly examine black-box attacks. In this paper, we expand upon the analysis of these defenses to include adaptive black-box adversaries. Our evaluation is done on nine defenses including Barrage of Random Transforms, ComDefend, Ensemble Diversity, Feature Distillation, The Odds are Odd, Error Correcting Codes, Distribution Classifier Defense, K-Winner Take All and Buffer Zones. Our investigation is done using two black-box adversarial models and six widely studied adversarial attacks for CIFAR-10 and Fashion-MNIST datasets. Our analyses show most recent defenses (7 out of 9) provide only marginal improvements in security ($<25\%$), as compared to undefended networks. For every defense, we also show the relationship between the amount of data the adversary has at their disposal, and the effectiveness of adaptive black-box attacks. Overall, our results paint a clear picture: defenses need both thorough white-box and black-box analyses to be considered secure. We provide this large scale study and analyses to motivate the field to move towards the development of more robust black-box defenses.

LGMar 1, 2020
A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning

Nhan H. Pham, Lam M. Nguyen, Dzung T. Phan et al.

We propose a novel hybrid stochastic policy gradient estimator by combining an unbiased policy gradient estimator, the REINFORCE estimator, with another biased one, an adapted SARAH estimator for policy optimization. The hybrid policy gradient estimator is shown to be biased, but has variance reduced property. Using this estimator, we develop a new Proximal Hybrid Stochastic Policy Gradient Algorithm (ProxHSPGA) to solve a composite policy optimization problem that allows us to handle constraints or regularizers on the policy parameters. We first propose a single-looped algorithm then introduce a more practical restarting variant. We prove that both algorithms can achieve the best-known trajectory complexity $\mathcal{O}\left(\varepsilon^{-3}\right)$ to attain a first-order stationary point for the composite problem which is better than existing REINFORCE/GPOMDP $\mathcal{O}\left(\varepsilon^{-4}\right)$ and SVRPG $\mathcal{O}\left(\varepsilon^{-10/3}\right)$ in the non-composite setting. We evaluate the performance of our algorithm on several well-known examples in reinforcement learning. Numerical results show that our algorithm outperforms two existing methods on these examples. Moreover, the composite settings indeed have some advantages compared to the non-composite ones on certain problems.

OCFeb 19, 2020
A Unified Convergence Analysis for Shuffling-Type Gradient Methods

Lam M. Nguyen, Quoc Tran-Dinh, Dzung T. Phan et al.

In this paper, we propose a unified convergence analysis for a class of generic shuffling-type gradient methods for solving finite-sum optimization problems. Our analysis works with any sampling without replacement strategy and covers many known variants such as randomized reshuffling, deterministic or randomized single permutation, and cyclic and incremental gradient schemes. We focus on two different settings: strongly convex and nonconvex problems, but also discuss the non-strongly convex case. Our main contribution consists of new non-asymptotic and asymptotic convergence rates for a wide class of shuffling-type gradient methods in both nonconvex and convex settings. We also study uniformly randomized shuffling variants with different learning rates and model assumptions. While our rate in the nonconvex case is new and significantly improved over existing works under standard assumptions, the rate on the strongly convex one matches the existing best-known rates prior to this paper up to a constant factor without imposing a bounded gradient condition. Finally, we empirically illustrate our theoretical results via two numerical examples: nonconvex logistic regression and neural network training examples. As byproducts, our results suggest some appropriate choices for diminishing learning rates in certain shuffling variants.

LGOct 3, 2019
BUZz: BUffer Zones for defending adversarial examples in image classification

Kaleel Mahmood, Phuong Ha Nguyen, Lam M. Nguyen et al.

We propose a novel defense against all existing gradient based adversarial attacks on deep neural networks for image classification problems. Our defense is based on a combination of deep neural networks and simple image transformations. While straightforward in implementation, this defense yields a unique security property which we term buffer zones. We argue that our defense based on buffer zones offers significant improvements over state-of-the-art defenses. We are able to achieve this improvement even when the adversary has access to the {\em entire} original training data set and unlimited query access to the defense. We verify our claim through experimentation using Fashion-MNIST and CIFAR-10: We demonstrate $<11\%$ attack success rate -- significantly lower than what other well-known state-of-the-art defenses offer -- at only a price of a $11-18\%$ drop in clean accuracy. By using a new intuitive metric, we explain why this trade-off offers a significant improvement over prior work.

CRMay 1, 2019
On the Convergence Rates of Learning-based Signature Generation Schemes to Contain Self-propagating Malware

Saeed Valizadeh, Marten van Dijk

In this paper, we investigate the importance of a defense system's learning rates to fight against the self-propagating class of malware such as worms and bots. To this end, we introduce a new propagation model based on the interactions between an adversary (and its agents) who wishes to construct a zombie army of a specific size, and a defender taking advantage of standard security tools and technologies such as honeypots (HPs) and intrusion detection and prevention systems (IDPSes) in the network environment. As time goes on, the defender can incrementally learn from the collected/observed attack samples (e.g., malware payloads), and therefore being able to generate attack signatures. The generated signatures then are used for filtering next attack traffic and thus containing the attacker's progress in its malware propagation mission. Using simulation and numerical analysis, we evaluate the efficacy of signature generation algorithms and in general any learning-based scheme in bringing an adversary's maneuvering in the environment to a halt as an adversarial containment strategy.

OCJan 22, 2019
Finite-Sum Smooth Optimization with SARAH

Lam M. Nguyen, Marten van Dijk, Dzung T. Phan et al.

The total complexity (measured as the total number of gradient computations) of a stochastic first-order optimization algorithm that finds a first-order stationary point of a finite-sum smooth nonconvex objective function $F(w)=\frac{1}{n} \sum_{i=1}^n f_i(w)$ has been proven to be at least $Ω(\sqrt{n}/ε)$ for $n \leq \mathcal{O}(ε^{-2})$ where $ε$ denotes the attained accuracy $\mathbb{E}[ \|\nabla F(\tilde{w})\|^2] \leq ε$ for the outputted approximation $\tilde{w}$ (Fang et al., 2018). In this paper, we provide a convergence analysis for a slightly modified version of the SARAH algorithm (Nguyen et al., 2017a;b) and achieve total complexity that matches the lower-bound worst case complexity in (Fang et al., 2018) up to a constant factor when $n \leq \mathcal{O}(ε^{-2})$ for nonconvex problems. For convex optimization, we propose SARAH++ with sublinear convergence for general convex and linear convergence for strongly convex problems; and we provide a practical version for which numerical experiments on various datasets show an improved performance.

LGJan 22, 2019
DTN: A Learning Rate Scheme with Convergence Rate of $\mathcal{O}(1/t)$ for SGD

Lam M. Nguyen, Phuong Ha Nguyen, Dzung T. Phan et al.

This paper has some inconsistent results, i.e., we made some failed claims because we did some mistakes for using the test criterion for a series. Precisely, our claims on the convergence rate of $\mathcal{O}(1/t)$ of SGD presented in Theorem 1, Corollary 1, Theorem 2 and Corollary 2 are wrongly derived because they are based on Lemma 5. In Lemma 5, we do not correctly use the test criterion for a series. Hence, the result of Lemma 5 is not valid. We would like to thank the community for pointing out this mistake!

CRJan 6, 2019
Toward a Theory of Cyber Attacks

Saeed Valizadeh, Marten van Dijk

We provide a general methodology for analyzing defender-attacker based "games" in which we model such games as Markov models and introduce a capacity region to analyze how defensive and adversarial strategies impact security. Such a framework allows us to analyze under what kind of conditions we can prove statements (about an attack objective $k$) of the form "if the attacker has a time budget $T_{bud}$, then the probability that the attacker can reach an attack objective $\geq k$ is at most $poly(T_{bud})negl(k)$". We are interested in such rigorous cryptographic security guarantees (that describe worst-case guarantees) as these shed light on the requirements of a defender's strategy for preventing more and more the progress of an attack, in terms of the "learning rate" of a defender's strategy. We explain the damage an attacker can achieve by a "containment parameter" describing the maximally reached attack objective within a specific time window.

OCNov 10, 2018
New Convergence Aspects of Stochastic Gradient Algorithms

Lam M. Nguyen, Phuong Ha Nguyen, Peter Richtárik et al.

The classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is violated for cases where the objective function is strongly convex. In Bottou et al. (2018), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. We show that for stochastic problems arising in machine learning such bound always holds; and we also propose an alternative convergence analysis of SGD with diminishing learning rate regime. We then move on to the asynchronous parallel setting, and prove convergence of Hogwild! algorithm in the same regime in the case of diminished learning rate. It is well-known that SGD converges if a sequence of learning rates $\{η_t\}$ satisfies $\sum_{t=0}^\infty η_t \rightarrow \infty$ and $\sum_{t=0}^\infty η^2_t < \infty$. We show the convergence of SGD for strongly convex objective function without using bounded gradient assumption when $\{η_t\}$ is a diminishing sequence and $\sum_{t=0}^\infty η_t \rightarrow \infty$. In other words, we extend the current state-of-the-art class of learning rates satisfying the convergence of SGD.

OCOct 10, 2018
Tight Dimension Independent Lower Bound on the Expected Convergence Rate for Diminishing Step Sizes in SGD

Phuong Ha Nguyen, Lam M. Nguyen, Marten van Dijk

We study the convergence of Stochastic Gradient Descent (SGD) for strongly convex objective functions. We prove for all $t$ a lower bound on the expected convergence rate after the $t$-th SGD iteration; the lower bound is over all possible sequences of diminishing step sizes. It implies that recently proposed sequences of step sizes at ICML 2018 and ICML 2019 are {\em universally} close to optimal in that the expected convergence rate after {\em each} iteration is within a factor $32$ of our lower bound. This factor is independent of dimension $d$. We offer a framework for comparing with lower bounds in state-of-the-art literature and when applied to SGD for strongly convex objective functions our lower bound is a significant factor $775\cdot d$ larger compared to existing work.

OCOct 9, 2018
Characterization of Convex Objective Functions and Optimal Expected Convergence Rates for SGD

Marten van Dijk, Lam M. Nguyen, Phuong Ha Nguyen et al.

We study Stochastic Gradient Descent (SGD) with diminishing step sizes for convex objective functions. We introduce a definitional framework and theory that defines and characterizes a core property, called curvature, of convex objective functions. In terms of curvature we can derive a new inequality that can be used to compute an optimal sequence of diminishing step sizes by solving a differential equation. Our exact solutions confirm known results in literature and allows us to fully characterize a new regularizer with its corresponding expected convergence rates.

CRJul 29, 2018
TREVERSE: Trial-and-Error Lightweight Secure Reverse Authentication with Simulatable PUFs

Yansong Gao, Marten van Dijk, Lei Xu et al.

A physical unclonable function (PUF) generates hardware intrinsic volatile secrets by exploiting uncontrollable manufacturing randomness. Although PUFs provide the potential for lightweight and secure authentication for increasing numbers of low-end Internet of Things devices, practical and secure mechanisms remain elusive. We aim to explore simulatable PUFs (SimPUFs) that are physically unclonable but efficiently modeled mathematically through privileged one-time PUF access to address the above problem. Given a challenge, a securely stored SimPUF in possession of a trusted server computes the corresponding response and its bit-specific reliability. Consequently, naturally noisy PUF responses generated by a resource limited prover can be immediately processed by a one-way function (OWF) and transmitted to the server, because the resourceful server can exploit the SimPUF to perform a trial-and-error search over likely error patterns to recover the noisy response to authenticate the prover. Security of trial-and-error reverse (TREVERSE) authentication under the random oracle model is guaranteed by the hardness of inverting the OWF. We formally evaluate the TREVERSE authentication capability with two SimPUFs experimentally derived from popular silicon PUFs.

CRApr 13, 2018
Comments on "Defeating HaTCh: Building Malicious IP Cores"

Syed Kamran Haider, Chenglu Jin, Marten van Dijk

Recently, Haider et al. introduced the first rigorous hardware Trojan detection algorithm called HaTCh. The foundation of HaTCh is a formal framework of hardware Trojan design, which formally characterizes all the hardware Trojans based on its properties. However, Bhardwaj et al. recently published one paper "Defeating HaTCh: Building Malicious IP Cores", which incorrectly claims that their newly designed hardware Trojan can evade the detection by HaTCh. In this paper, we explain why the claim of "defeating HaTCh" is incorrect, and we clarify several common misunderstandings about HaTCh.

OCFeb 11, 2018
SGD and Hogwild! Convergence Without the Bounded Gradients Assumption

Lam M. Nguyen, Phuong Ha Nguyen, Marten van Dijk et al.

Stochastic gradient descent (SGD) is the optimization algorithm of choice in many machine learning applications such as regularized empirical risk minimization and training deep neural networks. The classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is always violated for cases where the objective function is strongly convex. In (Bottou et al.,2016), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. Here we show that for stochastic problems arising in machine learning such bound always holds; and we also propose an alternative convergence analysis of SGD with diminishing learning rate regime, which results in more relaxed conditions than those in (Bottou et al.,2016). We then move on the asynchronous parallel setting, and prove convergence of Hogwild! algorithm in the same regime, obtaining the first convergence results for this method in the case of diminished learning rate.

CRJun 12, 2017
Revisiting Definitional Foundations of Oblivious RAM for Secure Processor Implementations

Syed Kamran Haider, Omer Khan, Marten van Dijk

Oblivious RAM (ORAM) is a renowned technique to hide the access patterns of an application to an untrusted memory. According to the standard ORAM definition presented by Goldreich and Ostrovsky, two ORAM access sequences must be computationally indistinguishable if the lengths of these sequences are identically distributed. An artifact of this definition is that it does not apply to modern ORAM implementations adapted in current secure processors technology because of their arbitrary lengths of memory access sequences depending on programs' behaviors (their termination times). As a result, the ORAM definition does not directly apply; the theoretical foundations of ORAM do not clearly argue about the timing and termination channels. This paper conducts a first rigorous study of the standard Goldreich-Ostrovsky ORAM definition in view of modern practical ORAMs (e.g., Path ORAM) and demonstrates the gap between theoretical foundations and real implementations. A new ORAM formulation which clearly separates out termination channel leakage is proposed. It is shown how this definition implies the standard ORAM definition (for finite length input access sequences) and better fits the modern practical ORAM implementations. The proposed definition relaxes the constraints around the stash size and overflow probability for Path ORAM, and essentially transforms its security argument into a performance consideration problem. Finally, a `strong' ORAM formulation which clearly includes obfuscation of termination leakage is shown to imply our new ORAM formulation and applies to ORAM for outsourced disk storage. In this strong formulation constraints are not relaxed and the security argument for Path ORAM remains complex as one needs to prove that the stash overflows with negligible probability.

CRMar 21, 2017
Intrinsically Reliable and Lightweight Physical Obfuscated Keys

Raihan Sayeed Khan, Nadim Kanan, Chenglu Jin et al.

Physical Obfuscated Keys (POKs) allow tamper-resistant storage of random keys based on physical disorder. The output bits of current POK designs need to be first corrected due to measurement noise and next de-correlated since the original output bits may not be i.i.d. (independent and identically distributed) and also public helper information for error correction necessarily correlates the corrected output bits.For this reason, current designs include an interface for error correction and/or output reinforcement, and privacy amplification for compressing the corrected output to a uniform random bit string. We propose two intrinsically reliable POK designs with only XOR circuitry for privacy amplification (without need for reliability enhancement) by exploiting variability of lithographic process and variability of granularity in phase change memory (PCM) materials. The two designs are demonstrated through experiments and simulations.

CRFeb 13, 2017
Connecting the Dots: Privacy Leakage via Write-Access Patterns to the Main Memory

Tara Merin John, Syed Kamran Haider, Hamza Omar et al.

Data-dependent access patterns of an application to an untrusted storage system are notorious for leaking sensitive information about the user's data. Previous research has shown how an adversary capable of monitoring both read and write requests issued to the memory can correlate them with the application to learn its sensitive data. However, information leakage through only the write access patterns is less obvious and not well studied in the current literature. In this work, we demonstrate an actual attack on power-side-channel resistant Montgomery's ladder based modular exponentiation algorithm commonly used in public key cryptography. We infer the complete 512-bit secret exponent in $\sim3.5$ minutes by virtue of just the write access patterns of the algorithm to the main memory. In order to learn the victim algorithm's write access patterns under realistic settings, we exploit a compromised DMA device to take frequent snapshots of the application's address space, and then run a simple differential analysis on these snapshots to find the write access sequence. The attack has been shown on an Intel Core(TM) i7-4790 3.60GHz processor based system. We further discuss a possible attack on McEliece public-key cryptosystem that also exploits the write-access patterns to learn the secret key.

ARNov 4, 2016
Flat ORAM: A Simplified Write-Only Oblivious RAM Construction for Secure Processors

Syed Kamran Haider, Marten van Dijk

Oblivious RAM (ORAM) is a cryptographic primitive which obfuscates the access patterns to a storage thereby preventing privacy leakage. So far in the current literature, only `fully functional' ORAMs are widely studied which can protect, at a cost of considerable performance penalty, against the strong adversaries who can monitor all read and write operations. However, recent research has shown that information can still be leaked even if only the write access pattern (not reads) is visible to the adversary. For such weaker adversaries, a fully functional ORAM turns out to be an overkill causing unnecessary overheads. Instead, a simple `write-only' ORAM is sufficient, and, more interestingly, is preferred as it can offer far more performance and energy efficiency than a fully functional ORAM. In this work, we present Flat ORAM: an efficient write-only ORAM scheme which outperforms the closest existing write-only ORAM called HIVE. HIVE suffers from performance bottlenecks while managing the memory occupancy information vital for correctness of the protocol. Flat ORAM resolves these bottlenecks by introducing a simple idea of Occupancy Map (OccMap) which efficiently manages the memory occupancy information resulting in far better performance. Our simulation results show that, on average, Flat ORAM only incurs a moderate slowdown of $3\times$ over the insecure DRAM for memory intensive benchmarks among Splash2 and $1.6\times$ for SPEC06. Compared to HIVE, Flat ORAM offers $50\%$ performance gain on average and up to $80\%$ energy savings.

CRMay 26, 2016
Advancing the State-of-the-Art in Hardware Trojans Design

Syed Kamran Haider, Chenglu Jin, Marten van Dijk

Electronic Design Automation (EDA) industry heavily reuses third party IP cores. These IP cores are vulnerable to insertion of Hardware Trojans (HTs) at design time by third party IP core providers or by malicious insiders in the design team. State of the art research has shown that existing HT detection techniques, which claim to detect all publicly available HT benchmarks, can still be defeated by carefully designing new sophisticated HTs. The reason being that these techniques consider the HT landscape to be limited only to the publicly known HT benchmarks, or other similar (simple) HTs. However the adversary is not limited to these HTs and may devise new HT design principles to bypass these countermeasures. In this paper, we discover certain crucial properties of HTs which lead to the definition of an exponentially large class of Deterministic Hardware Trojans $H_D$ that an adversary can (but is not limited to) design. The discovered properties serve as HT design principles, based on which we design a new HT called 'XOR-LFSR' and present it as a 'proof-of-concept' example from the class $H_D$. These design principles help us understand the tremendous ways an adversary has to design a HT, and show that the existing publicly known HT benchmarks are just the tip of the iceberg on this huge landscape. This work, therefore, stresses that instead of guaranteeing a certain (low) false negative rate for a small constant set of publicly known HTs, a rigorous HT detection tool should take into account these newly discovered HT design principles and hence guarantee the detection of an exponentially large class (exponential in number of wires in IP core) of HTs with negligible false negative rate.

CRFeb 23, 2012
Path ORAM: An Extremely Simple Oblivious RAM Protocol

Emil Stefanov, Marten van Dijk, Elaine Shi et al.

We present Path ORAM, an extremely simple Oblivious RAM protocol with a small amount of client storage. Partly due to its simplicity, Path ORAM is the most practical ORAM scheme known to date with small client storage. We formally prove that Path ORAM has a O(log N) bandwidth cost for blocks of size B = Omega(log^2 N) bits. For such block sizes, Path ORAM is asymptotically better than the best known ORAM schemes with small client storage. Due to its practicality, Path ORAM has been adopted in the design of secure processors since its proposal.