Nhuong V. Nguyen

h-index6

5papers

43citations

Novelty47%

AI Score29

Ranked #145,257 of 194,257 authors (top 75%)#31,977 in LG (top 80%)

5 Papers

3.1LGDec 27, 2021Code

AET-SGD: Asynchronous Event-triggered Stochastic Gradient Descent

Nhuong Nguyen, Song Han

Communication cost is the main bottleneck for the design of effective distributed learning algorithms. Recently, event-triggered techniques have been proposed to reduce the exchanged information among compute nodes and thus alleviate the communication cost. However, most existing event-triggered approaches only consider heuristic event-triggered thresholds. They also ignore the impact of computation and network delay, which play an important role on the training performance. In this paper, we propose an Asynchronous Event-triggered Stochastic Gradient Descent (SGD) framework, called AET-SGD, to i) reduce the communication cost among the compute nodes, and ii) mitigate the impact of the delay. Compared with baseline event-triggered methods, AET-SGD employs a linear increasing sample size event-triggered threshold, and can significantly reduce the communication cost while keeping good convergence performance. We implement AET-SGD and evaluate its performance on multiple representative data sets, including MNIST, FashionMNIST, KMNIST and CIFAR10. The experimental results validate the correctness of the design and show a significant communication cost reduction from 44x to 120x, compared to the state of the art. Our results also show that AET-SGD can resist large delay from the straggler nodes while obtaining a decent performance and a desired speedup ratio.

1.6LGJun 6, 2021Code

Distributed Learning and its Application for Time-Series Prediction

Nhuong V. Nguyen, Sybille Legitime

Extreme events are occurrences whose magnitude and potential cause extensive damage on people, infrastructure, and the environment. Motivated by the extreme nature of the current global health landscape, which is plagued by the coronavirus pandemic, we seek to better understand and model extreme events. Modeling extreme events is common in practice and plays an important role in time-series prediction applications. Our goal is to (i) compare and investigate the effect of some common extreme events modeling methods to explore which method can be practical in reality and (ii) accelerate the deep learning training process, which commonly uses deep recurrent neural network (RNN), by implementing the asynchronous local Stochastic Gradient Descent (SGD) framework among multiple compute nodes. In order to verify our distributed extreme events modeling, we evaluate our proposed framework on a stock data set S\&P500, with a standard recurrent neural network. Our intuition is to explore the (best) extreme events modeling method which could work well under the distributed deep learning setting. Moreover, by using asynchronous distributed learning, we aim to significantly reduce the communication cost among the compute nodes and central server, which is the main bottleneck of almost all distributed learning frameworks. We implement our proposed work and evaluate its performance on representative data sets, such as S&P500 stock in $5$-year period. The experimental results validate the correctness of the design principle and show a significant training duration reduction upto $8$x, compared to the baseline single compute node. Our results also show that our proposed work can achieve the same level of test accuracy, compared to the baseline setting.

1.6LGFeb 17, 2021

Proactive DP: A Multple Target Optimization Framework for DP-SGD

Marten van Dijk, Nhuong V. Nguyen, Toan N. Nguyen et al.

We introduce a multiple target optimization framework for DP-SGD referred to as pro-active DP. In contrast to traditional DP accountants, which are used to track the expenditure of privacy budgets, the pro-active DP scheme allows one to a-priori select parameters of DP-SGD based on a fixed privacy budget (in terms of $ε$ and $δ$) in such a way to optimize the anticipated utility (test accuracy) the most. To achieve this objective, we first propose significant improvements to the moment account method, presenting a closed-form $(ε,δ)$-DP guarantee that connects all parameters in the DP-SGD setup. We show that DP-SGD is $(ε<0.5,δ=1/N)$-DP if $σ=\sqrt{2(ε+\ln(1/δ))/ε}$ with $T$ at least $\approx 2k^2/ε$ and $(2/e)^2k^2-1/2\geq \ln(N)$, where $T$ is the total number of rounds, and $K=kN$ is the total number of gradient computations where $k$ measures $K$ in number of epochs of size $N$ of the local data set. We prove that our expression is close to tight in that if $T$ is more than a constant factor $\approx 4$ smaller than the lower bound $\approx 2k^2/ε$, then the $(ε,δ)$-DP guarantee is violated. The above DP guarantee can be enhanced in thatDP-SGD is $(ε, δ)$-DP if $σ= \sqrt{2(ε+\ln(1/δ))/ε}$ with $T$ at least $\approx 2k^2/ε$ together with two additional, less intuitive, conditions that allow larger $ε\geq 0.5$. Our DP theory allows us to create a utility graph and DP calculator. These tools link privacy and utility objectives and search for optimal experiment setups, efficiently taking into account both accuracy and privacy objectives, as well as implementation goals. We furnish a comprehensive implementation flow of our proactive DP, with rigorous experiments to showcase the proof-of-concept.

5.8LGOct 27, 2020

Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

Marten van Dijk, Nhuong V. Nguyen, Toan N. Nguyen et al.

Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets in a heterogeneous way -- and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central "aggregator" which mimics Hogwild!. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). We improve state-of-the-art literature and show $O(\sqrt{K}$) communication rounds for heterogeneous data for strongly convex problems, where $K$ is the total number of gradient computations across all local compute nodes. For our scheme, we prove a \textit{tight} and novel non-trivial convergence analysis for strongly convex problems for {\em heterogeneous} data which does not use the bounded gradient assumption as seen in many existing publications. The tightness is a consequence of our proofs for lower and upper bounds of the convergence rate, which show a constant factor difference. We show experimental results for plain convex and non-convex problems for biased (i.e., heterogeneous) and unbiased local data sets.

13.6LGJul 17, 2020Code

Asynchronous Federated Learning with Reduced Number of Rounds and with Differential Privacy from Less Aggregated Gaussian Noise

Marten van Dijk, Nhuong V. Nguyen, Toan N. Nguyen et al.

The feasibility of federated learning is highly constrained by the server-clients infrastructure in terms of network communication. Most newly launched smartphones and IoT devices are equipped with GPUs or sufficient computing hardware to run powerful AI models. However, in case of the original synchronous federated learning, client devices suffer waiting times and regular communication between clients and server is required. This implies more sensitivity to local model training times and irregular or missed updates, hence, less or limited scalability to large numbers of clients and convergence rates measured in real time will suffer. We propose a new algorithm for asynchronous federated learning which eliminates waiting times and reduces overall network communication - we provide rigorous theoretical analysis for strongly convex objective functions and provide simulation results. By adding Gaussian noise we show how our algorithm can be made differentially private -- new theorems show how the aggregated added Gaussian noise is significantly reduced.